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Introduction 


Who we are? Where do we come from? 
These are some of the most baffling questions of the genomics era. Probably 


the answer to these, lie in our genome itself. 


Human individualization based on the genome exploits the fact that everyone 
except for identical twins is genetically distinguishable. Moreover, genetic 
material is found in every nucleated cell in the body and can be recovered 
from samples as diverse as bone, blood stains, saliva residues, nasal 
secretions, and even fingerprints (Hoff-Olson et al., 1999; Schiffner et al., 
2005; Wurmb-Schwark et al., 2006). DNA may be recovered from very old 
samples that have been well preserved (Weir, 2001). Over the past 60 years, 
DNA has arisen from being an obscure molecule with presumed accessory or 
structural functions inside the nucleus to the icon of modern bioscience 
(Alberts et al., 2002; Primrose & Twyman, 2003). New tools of molecular 
biology have enabled forensic scientists to characterize biological evidence at 
the DNA level (James & Nordby, 2005; Thompson & Black, 2007). The 
ability to type DNA from biological evidence is one of the most important 
developments in forensic science. DNA technology affords the forensic 
scientist the ability to eliminate individuals who have been falsely associated 
with a biological sample and to reduce the number of potential contributors to 
a few (if not one) individuals (Gardner et al., 2002; Watson et al., 2004). The 
technology today includes number of genetic markers, a variety of valid DNA 
typing strategies, and analytical software. All of which make developing DNA 
profiles and searching DNA databanks relatively rapid and facile (Butler, 
2012). Since cases can be analyzed more rapidly and DNA databanks can be 
generated more rapidly than a decade ago, DNA data-banking can be 
established and used to search DNA profiles/records to help resolve a number 


of violent crimes. 
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1.1 DNA - THE BLUEPRINT OF LIFE 


1869 was a landmark year in genetic research; because it was the year in 
which Swiss physician Friedrich Miescher first identified nucleic acid or 
DNA, what he named as “nuclein” (Brown, 2002; Wolf, 2003; Hedrick, 2005; 
Dahm, 2008). Russian biochemist Phoebus Levene was the first to discover 
the order of the three major components of a single nucleotide (phosphate- 
sugar-base), the carbohydrate component of RNA (ribose) and DNA 
(deoxyribose) (Levene, 1919). Erwin Chargaff was one of a handful of 
scientists who expanded on Levene's work by uncovering additional details 
about the structure of DNA (Kendall & Osterberg, 1919; Hershey & Chase, 
1952; Chargaff, 1950, 1971). The story of DNA often seems to begin in 1944 
with Avery, MacLeod, and McCarty showing that DNA is the hereditary 
material (Avery et al., 1944; McCarty, 1994). Without the scientific 
foundation provided by these pioneers, Watson and Crick may never have 
reached their groundbreaking conclusion of 1953: that the DNA molecule 
exists in the form of a three-dimensional double helix (Wilkins et al., 1951, 
1953; Watson & Crick, 1953; Franklin & Gosling, 1953; Pauling & Corey, 
1953; Rich & Watson, 1954). Within another decade, the complexity of 
genetic code was cracked (Nirenberg et al., 1963, 1966; Lederberg, 1994; 
Hartl & Clark, 2006; Lewin, 2007; Gann & Witkowski, 2012). 


DNA is present in the nucleus along with histone proteins in the form of 
highly coiled structure called as chromosomes (Painter, 1921, 1923). The 
haploid human genome contains approximately 3 billion base pairs of DNA 
packaged into 23 chromosomes (Venter eft al., 2001). The correct 
determination of the human diploid chromosome number as 46, by J-H Tjio 


and A Levan, at the University of Lund, Sweden, occurred 50 years ago, in 
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December 1955 (Hsu, 1952; Tjio & Levan, 1956; Speicher et al., 1996; Trask, 
2002; Gartler, 2006). Structure of chromosome and packaging of DNA were 
first enunciated by Thomas & Kornberg (1975). Humans inherit one set of 
chromosomes from their mother and a second set from their father. In total, 
most human cells contain 46 chromosomes with 22 pairs of autosomes, or 
non-sex chromosomes, and two sex-determining chromosomes. The sex 
chromosomes in humans are called X and Y. Females carry two X 
chromosomes, while males carry one X and one Y chromosome (Ford & 
Hamerton, 1956). The Y chromosome has often been used as a marker for 
studying human demographic history (Painter, 1923; Stern, 1957; Jobling & 
Tyler-Smith, 2003). The Y chromosome does not undergo homologous 
recombination, except in the small pseudoautosomal regions (Whitfield et al., 


1995; Pritchard et al., 1999; Mangs & Morris, 2007; Devlin, 2010). 
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Figure 1.1: Karyotype of normal human male (Hsu, 1952). 
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1.2 DNA POLYMORPHISM 


The human genome is composed of approximately 3 billion base pairs 
organized into an estimated 30,000 genes. The human genome has many 
repeated sequences (Britten & Kohne, 1968; Armour ef a/., 1989; Venter et 
al., 2001; Ellegren, 2004). Tandem repeats are an array of consecutive repeats. 
They include 3 sub-classes: satellites, minisatellites and microsatellites. The 
name satellite comes from their optical spectra. By using buoyant density 
gradient centrifugation, DNA fragments with significantly different base 
compositions may be separated, and then monitored by the absorption spectra 
of ultra-violet light. The main band represents the bulk DNA, and the 
“satellite” bands originate from tandem repeats (Britten & Kohne, 1968; 
Housman, 1995; Collins et al., 1998; Bailey et al., 2002; Primrose & 
Twyman, 2003; Cooper, 2006). The term “polymorphism” describes the 
existence of different forms within a population, e.g., difference in the number 
of tandem repeats. All tandem repeat polymorphisms could result from DNA 
recombination during meiosis (Biemont & Vieira, 2006). The microsatellite 


polymorphism could also be caused by replication slippage (Cooper, 2006). 


1.2.1 SATELLITES 

The size of satellite DNA ranges from 100kbp to over 1Mbp in humans, a 
well-known example is the alphoid DNA located at the centromere of all the 
chromosomes. Its repeat unit is 171 bp and the repetitive region accounts for 
3-5% of the DNA in each chromosome (Britten & Kohne, 1968; Goodbourn 
et al., 1983; Gomolka et al., 1994). Other satellites have a shorter repeat unit. 


Most satellites in humans or in other organisms are located at the centromere. 
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1.2.2 MINISATELLITES 

The size of a minisatllite ranges from | kbp to 20 kbp. Minisatellites are also 
known as variable number of tandem repeats (VNTR). Its repeat unit ranges 
from 9 bp to 80 bp (Bell et a/., 1982; Armour ef al., 1989; Horn et al., 1989). 
They are located in non-coding regions. The number of repeats for a given 
minisatellite may differ between individuals. This feature is the basis of DNA 
fingerprinting (Jeffreys et al., 1985; O'Connell et al., 1988; Kasai et al., 
1990). Another type of minisatellites is the telomere. In a human germ cell, 
the size of a telomere is above 15 kb. In aging somatic cell, the telomere is 
shorter. The telomere contains tandemly repeated sequence GGGTTA (Wong 
et al., 1987; Tautz, 1989). Another example of a VNTR is the forensic DNA 
marker D1S80. The D1S80 marker is a minisatellite with a 16bp repeat unit 


and contains alleles in the range of 16-41 repeats (Budowle et al., 1991). 


1.2.3 MICROSATELLITES 

Microsatellites are also known as short tandem repeats (STRs), because a 
repeat unit consists of only 2 to 7 bp and whole repetitive region spans less 
than 150 bp. STRs were first reported in the late 1980s and the number of 
repeats for a given microsatellite may differ between individuals (Weber et 
al., 1989; Watson et al., 2004). Therefore, microsatellites can also be used for 
DNA fingerprinting. In addition, both microsatellites and minisatellites 
patterns can provide information about paternity (Edwards ef al., 1991; 
Collins et al., 2003; Urquhart et a/., 1993). The most famous case is President 
Thomas Jefferson and his alleged sons (Foster et al., 1998). One of the 
greatest mysteries for most of the twentieth century was the fate of the 
Romanov family, the last Russian monarchy and the case was solved by 
combined analysis of autosomal and Y-chromosomal STRs (Coble et al., 


2009). 


Introduction 


“Whenever you have excluded the impossible, whatever remains, however 
improbable, must be the Truth.” 

- Sir Arthur Conan Doyle 
1.3 DNA FINGERPINTING 


The process of “DNA fingerprinting’ or DNA profiling was first described in 
1985 by an English geneticist named Alec Jeffreys. The human genome is full 
of repeated DNA sequences. These repeated sequences come in various sizes 
and are classified according to the length of the core repeat units, the number 
of contiguous repeat units, and/or the overall length of the repeat region 
(Housman, 1995). The number of repeated sections present in a sample could 
differ from individual to individual. Sir Alec Jeffreys developed a technique to 
examine the length variation of these DNA repeated sequences to perform 
human identity tests (Jeffreys et al., 1985, 1986; Wong et al., 1986, 1987). 
These repeated DNA sequences are known as VNTRs (variable number of 
tandem repeats). The technique used by Dr. Jeffreys to examine the VNTRs 
was called restriction fragment length polymorphism (RFLP) because it 
involved the use of a restriction enzyme (Meselson & Yuan, 1968) to cut the 
regions of DNA surrounding the VNTRs. This RFLP method was first used to 
help in an English immigration case and shortly thereafter to solve a double 
homicide case in UK (Jeffreys et al., 1985). Since that time, human identity 
testing using DNA typing methods has been widespread (Marroni, 2001; 
Hood & Galas, 2003). 


Any material that contains nucleated cells, including blood, semen, saliva, 
hair, bones, and teeth, potentially can be typed for DNA polymorphisms 
(Higuchi et al., 1988; Kasai et al., 1990; Walsh et al., 1991; Hochmeister et 
al., 1991). The typing of VNTR loci by RFLP analysis is the most 
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discriminating, or individualizing, molecular biology technology for forensic 
identity testing. Although this approach is valid and reliable for forensic and 
paternity testing, it has certain limitations. These include: 

1) Sufficient quantity of high molecular weight DNA (usually at least 50 ng) 
is required for RFLP analysis. 

2) Samples that have been substantially degraded cannot be analyzed by 
RFLP typing. 


3) RFLP analysis is laborious as well as time-consuming. 


An alternative strategy for forensic DNA typing is the use of STRs through 
PCR-based assays. DNA regions with short repeat units (usually 2 to 6 bp in 
length) are called short tandem repeats (STR). STRs have proven to have 
several benefits that make them especially suitable for human identification. 
Compared with the RFLP approach, the advantages PCR-based technologies 
include augmented sensitivity and specificity and decreased assay time and 
labor. Also, many degraded DNA samples can be amplified by PCR and 
subsequently typed because amplified alleles generally are much smaller in 
size compared with alleles detected by RFLP analysis. These features make 
PCR a particularly useful tool for analyzing biological material found at crime 


scenes (Comey & Budowle, 1991; Reynolds et al., 1991). 


STRs have become popular DNA repeat markers because they are easily 
amplified by the polymerase chain reaction without the problems of 
differential amplification. This is due to the fact that both alleles from a 
heterozygous individual are similar in size since the repeat size is small (Dib 
et al., 1996). The number of repeats in STR markers can be highly variable 
among individuals, which make these STRs effective for human identification 


purposes (Bar et al., 1997; Thompson & Black, 2007). 
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1.4 PCR-BASED GENETIC MARKERS (STRs) 


PCR (Polymerase chain reaction) has revolutionized molecular biology with 
the ability to make hundreds of millions of copies of a specific sequence of 
DNA in a matter of only a few hours (Mullis et al., 1986; Mullis et al., 1987). 
Without the ability to make copies of DNA molecules, many forensic samples 
would be impossible to analyze. DNA from crime scenes is often limited in 
both quantity and quality and obtaining a cleaner, more concentrated sample is 
normally out of the question. The PCR DNA amplification technology is well 
suited to analysis of forensic DNA samples because it is sensitive, rapid, and 
not limited by the quality of the DNA as are the restriction fragment length 
polymorphism (RFLP) methods (Jeffreys et al., 1985; Jeffreys et al., 1986; 
Sajantila et al., 1992). 


PCR permits more than one region of DNA to be copied simultaneously by 
simply adding more than one primer set to the reaction mixture. The 
simultaneous amplification of two or more regions of DNA is commonly 
known as multiplexing or multiplex PCR (Bosch et al., 2002). For a multiplex 
reaction to work properly, the primer pairs need to be compatible. In other 
words, the primer annealing temperatures should be similar and excessive 
regions of complementarities should be avoided to prevent the formation of 
primer dimers that will cause the primers to bind to one another instead of the 
template DNA. The addition of each new primer in a multiplex PCR reaction 
exponentially increases the complexity of possible primer interactions. 
Multiplex PCR technique is extensively being used in STR-based DNA 
fingerprinting techniques (Butler et al., 2001; Butler et al., 2002; Schoske et 
al., 2003). Considerable time and effort can be saved by simultaneously 


amplifying multiple sequences in a single reaction, a process referred to as 
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multiplex polymerase chain reaction (PCR). Multiplex PCR requires that 
primers lead to amplification of unique regions of DNA, both in individual 
pairs and in combinations of many primers, under a single set of reaction 
conditions. In addition, methods must be available for the analysis of each 
individual amplification product from the mixture of all the products. 
Multiplex PCR is becoming a rapid and convenient screening assay in both 
the clinical and the research laboratory. The development of an efficient 
multiplex PCR usually requires strategic planning and multiple attempts to 
optimize reaction conditions. For a successful multiplex PCR assay, the 
relative concentration of the primers, concentration of the PCR buffer, balance 
between the magnesium chloride and deoxynucleotide concentrations, cycling 
temperatures, and amount of template DNA and Taq DNA polymerase are 
important. An optimal combination of annealing temperature and buffer 
concentration is essential in multiplex PCR to obtain highly specific 
amplification products (D'Aquila et al, 1991). Magnesium chloride 
concentration needs only to be proportional to the amount of dNTP, while 
adjusting primer concentration for each target sequence is also essential. The 
list of various factors that can influence the reaction is by no means complete. 
Optimization of the parameters discussed in the present review should provide 
a practical approach toward resolving the common problems encountered in 
multiplex PCR (such as spurious amplification products, uneven or no 
amplification of some target sequences, and difficulties in reproducing some 
results). Thorough evaluation and validation of new multiplex PCR 
procedures is essential. The sensitivity and specificity must be thoroughly 
evaluated using standardized purified nucleic acids (Mullis et al., 1986; Mullis 


et al., 1987; Markoulatos et al., 2002). 
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For human identification purposes, it is important to have DNA markers that 
exhibit the highest possible variation in order to discriminate between samples 
(Hammond et al., 1994). The smaller size of STR alleles makes STR marker 
better candidates for use in forensic applications, in which degraded DNA is 
common. PCR amplification of degraded DNA samples can be better 
accomplished with smaller target product sizes. Because of their smaller size, 
STR alleles can also be separated from other chromosomal locations more 
easily to ensure closely linked loci are not chosen. Closely linked loci do not 
follow the predictable pattern of random distribution in the population, 
making statistical analysis difficult. STR alleles also have lower mutation 
rates, which make the data more stable and predictable. Because of these 
characteristics, STRs with higher power of discrimination are chosen for 
human identification in forensic cases on a regular basis. It is used to identify 
victim, perpetrator, missing persons, and personal identification in case of 


mass disaster (Butler et al., 2001; Butler et al., 2002). 


1.4.1 TYPES OF STR MARKERS 

STR repeat sequences are named by the length of the repeat unit. Dinucleotide 
repeats have two nucleotides repeated next to each other over and over again. 
Trinucleotides have three nucleotides in the repeat unit, tetranucleotides have 
four, pentanucleotides have five, and hexanucleotides have six repeat units in 
the core repeat. Tetranucleotide repeats have become the most popular STR 
markers for human identification. STR sequences not only vary in the length 
of the repeat unit and the number of repeats but also in the rigor with which 
they conform to an incremental repeat pattern (Tautz et al., 1993; Urquhart e¢ 
al., 1994). STRs are often divided into several categories based on the repeat 
pattern. Simple repeats contain units of identical length and sequence, 


compound repeats comprise two or more adjacent simple repeats, and 
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complex repeats may contain several repeat blocks of variable unit length as 
well as variable intervening sequences. Complex hypervariable repeats also 
exist with numerous non-consensus alleles that differ in both size and 
sequence and are therefore challenging to genotype reproducibly. This last 
category of STR markers is not commonly used in forensic DNA typing due 
to difficulties with allele nomenclature and measurement variability between 


laboratories (Butler, 2001, 2005). 


Among the various types of STR systems, tetranucleotide repeats have 
become more popular than di- or trinucleotides. Penta- and hexanucleotide 
repeats are less common in the human genome but are being examined by 
some laboratories. STR product amounts vary depending on the STR locus but 
are usually less than 15% of the allele product quantity with tetranucleotide 
repeats. With di- and trinucleotides, the stutter percentage can be much greater 
(30% or more), making it difficult to interpret sample mixtures. In addition, 
the four-base spread in alleles with tetranucleotides makes closely spaced 
heterozygotes easier to resolve with size-based electrophoretic separations 
compared to alleles that could be two or three bases different in size with 


dinucleotides and trinucleotide markers, respectively (Butler, 2010). 


1.4.2.1 Autosomal Short Tandem Repeats 

For DNA typing markers to be effective across a wide number of 
jurisdictions, a common set of standardized markers must be used. The STR 
loci that are commonly used today were initially characterized and developed 
at the Baylor College of Medicine, England (Edwards et al., 1991; Puers et 
al., 1993; Hammond et al., 1994). The Promega Corporation (Madison, WI) 
initially commercialized many markers, while Applied Biosystems (Foster 


City, CA) incorporated some new markers. The STR project beginning in 
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April 1996 and concluding in November 1997 involved 22 DNA typing 
laboratories and the evaluation of 17 candidate STR loci. The evaluated STR 
loci were CSF1IPO, F13A01, F13B, FES/FPS, FGA, LPL, TH01, TPOX, 
VWA, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, 
and D21S11. Details of some commonly used autosomal STRs have been 
discussed in Table1.1. 


Table 1.1: Autosomal STR loci (Butler, 2006) 


Locus Chromosomal Physical Category & Allele 
Location Position Repeat Motif | Range 
TPOX 2p25.3 thyroid Chr 2 1.472 | Simple 4-16 
peroxidase, 10th Mb GAAT 
intron 
D2S1338 | 2q35 Chr 2 Compound 15 — 28 
218.705 Mb | TGCC/TTCC 
D3S81358 | 3p21.31 Chr 3 45.557 | Compound 8-21 
Mb TCTG/TCTA 
FGA 4q31.3 alpha Chr 4 Compound 12.2- 
fi brinogen, 3rd 155.866 Mb | CTTT/TTCC | 51.2 
intron 
DS5S818 | 5q23.2 Chr 5 Simple 7-18 
123.139 Mb_ | AGAT 
CSFIPO | 5q33.1 c-fms Chr 5 Simple 5-16 
proto-oncogene, 149.436 Mb | TAGA 
6th intron 
SE33 6q14 beta actin- Chr 6 89.043 | Complex 4.2- 
Related Mb AAAG 37 
pseudogene 
D7S820 | 7q21.11 Chr 7 83.433 | Simple 5-16 
Mb GATA 
D8S1179 | 8q24.13 Chr 8 Compound 7-20 
125.976 Mb__| TCTA/TCTG 
THO1 11p15.5 tyrosine Chr 11 2.149 | Simple 3-44 
hydroxylase, Ist Mb TCAT 
intron 
VWA 12p13.31 von Chr 12 5.963 | Compound 10-25 
Willebrand factor, | Mb TCTG/TCTA 
40th intron 
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D138317 | 13q31.1 Chr 13 Simple 5-16 
81.620 Mb TATC 

PentaE | 15q26.2 Chr 15 Simple 5 —24 
95.175 Mb AAAGA 

D16S539 | 16q24.1 Chr. 16 Simple 5-16 
84.944 Mb GATA 

D18S51_ | 18q21.33 Chr 18 Simple 7-40 
59.100 Mb AGAA 

D19S433 | 19q12 Chr 19 Compound 9- 
35.109 Mb AAGG/TAGG | 17.2 

D21S11 | 21q21.1 Chr 21 Complex 12 - 
19.476 Mb TCTA/ TCTG | 41.2 

PentaD | 21q22.3 Chr 21 Simple 2d = 
43.880 Mb AAAGA 17 


1.4.2.2 Y-Chromosomal Short Tandem Repeats (Y-STRs) 

Y-STRs are short tandem repeats (STRs) found on the male specific Y- 
chromosome. The coding genes, mostly found on the short arm of the Y- 
chromosome, are vital to male sex determination, spermatogenesis and other 
male related functions (Chakraborty, 1985; Jobling & Tyler-Smith, 2003; 
Elhaik, 2014). The Y-STRs are polymorphic among unrelated males and are 
inherited through the paternal line with little change through generations 


(Malaspina et al., 1990; Hurles & Jobling, 2001). 


Y-STRs have been used by forensic laboratories to examine sexual assault 
evidence. In a sexual assault case, evidence such as vaginal swabs will contain 
both female and male DNA. Differential extraction is often used to separate 
the male component from the female component. More often, however, the 
male and female components cannot be separated completely (Cerri et al., 
2003). As a result, the female component could exist prominently even in the 


male component after separation. When the “male DNA sample” undergoes 
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the PCR amplification process, the female DNA component is amplified as 
well, sometimes masking the male DNA, which makes analysis difficult 


(Jobling et al., 1997; de Knijff, 2000; Corach et al., 2001). 


Masking does not occur when Y-STRs are examined. Since there is no Y-STR 
in the female evidence, the only contribution of Y-STR can only come from 
the assailant(s) in a sexual assault case (Corach et al., 2001). The male 
component will be easily detected, since only this part of DNA will be 
amplified. The Y-STR system is especially helpful when there is more than 
one assailant. The mixed pattern in the evidence can help to identify those 
males responsible for the assault. Y-STR is also used for non-sexual assault 
cases where mixed samples are collected from evidence. Sometimes, regular 
STR will cause the masking effect if there is a very small quantity of male 
DNA in the mixed sample. Performing Y-STR testing can help to identify all 
males who have contributed to the evidence (Mathias et al., 1994; de Knijff et 
al., 1997; Ballantyne et al., 2010). 


In 1992, Lutz Roewer and colleagues described the first polymorphic Y- 
chromosome marker Y-27H39 - now better known as the STR locus DYS19. 
For the next ten years, discovery of polymorphic tandem repeat markers on 
the Y-chromosome progressed much more slowly than for their autosomal 
counterparts. Only 30 markers were available to researchers in year 2002. But 
in the last decade, 200 new STR markers have been uncovered due to 
extensive research on Y-chromosome (Lahn et al., 2001). The rapid growth in 
the discovery of new Y-STR markers is a direct result of the availability of 
DNA sequence information from the Human Genome Project and improved 
bioinformatics tools for searching DNA sequence databases (Lander ef al., 


2001). 
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In 1997, the European forensic community settled on a core set of Y-STR 
markers or “minimal haplotype” that includes DYS19, DYS389I/I, DYS390, 
DYS391, DYS392, DYS393, and DYS385 a/b with YCAITI a/b as an optional 
marker to create an “extended haplotype” . Most Y-chromosome data to date 
has been generated with these loci. In early 2003, the U.S. Scientific Working 
Group on DNA Analysis Methods (SWGDAM) selected a core set of markers 
that includes the 9 markers in the minimal haplotype plus DYS438 and 
DYS439 (Bar et al., 1997; Beleza et al., 2003; Butler et al., 2008). 
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Figure 1.2: Position of various Y-STRs on Y-chromosome (Redd et al., 2002) 
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In forensic science, Y-chromosome is analyzed for following purposes: 

(1) Forensic casework on sexual assault evidence -— male-specific 
amplification can be done through Y-STR analysis, which can avoid 
differential extraction to separate sperm and epithelial cells 

(2) Paternity testing — male children can be tied to fathers in motherless 
paternity cases. 

(3) Missing person’s investigations — patrilineal male relatives may be used 
for reference samples. 

(4) Human migration and evolutionary studies — lack of recombination 
enables comparison of male individuals separated by large periods of time. 

(5) Historical and genealogical research — surnames usually retained by males; 


can make links where paper trail is limited. 


1.4.2 SEPARATION AND DETECTION OF PCR PRODUCTS 

A polymerase chain reaction (PCR), in which short tandem repeat (STR) 
alleles are amplified produces a mixture of DNA molecules that present a 
challenging separation problem. A multiplex PCR can produce 20 or more 
different sized DNA fragments representing different alleles that must be 
resolved from one another. The separation is typically performed by a process 
known as electrophoresis. PCR products from short tandem repeat DNA must 
be separated in a fashion that allows each allele to be distinguished from other 
alleles. Heterozygous alleles are resolved in this manner with a size-based 
separation method known as electrophoresis. The separation medium may be 
in the form of a slab gel or a capillary (Allen et al., 1989). Capillary 
electrophoresis (CE) is a relatively new addition to the electrophoresis family. 
The first CE separations of DNA were performed in the late 1980s (Maxam & 
Gilbert, 1977; Sanger et al., 1977). Since the introduction of new CE 


instrumentation in the mid-1990s, the technique has gained rapidly in 


16 


Introduction 


popularity for routine forensic analyses. While slab-gel electrophoresis has 
been a proven technique for over 40 years, there are a number of advantages 
to analyzing DNA in a capillary format. First and foremost, the injection, 
separation, and detection steps can be fully automated, permitting multiple 
samples to be run unattended by CE. In addition, only tiny quantities of 
sample are consumed in the injection process, leaving enough samples to be 
easily retested if needed. This is an important advantage for precious forensic 
specimens that often cannot be easily replaced. Separation in capillaries may 
be conducted in minutes rather than hours due to higher voltages that are 
permitted with improved heat dissipation from capillaries. Another advantage 
is that CE instruments are designed such that quantitative information is 
readily available in an electronic format following the completion of a run. No 
extra steps such as scanning the gel or taking a picture of it are required 


(McCord et al., 1993; Butler et al., 1994). 


Over the years a number of methods have been used for detecting DNA 
molecules following electrophoretic separation. Early techniques involved 
radioactive labels and autoradiography. These methods were sensitive and 
effective but time consuming. In addition, the use of radioisotopes was 
expensive due to the need for photographic films and supplies and the 
extensive requirements surrounding the handling and disposal of radioactive 
materials. Since the late 1980s, methods such as silver staining and 
fluorescence techniques have gained in popularity for detecting STR alleles 
due to their low cost, in the case of silver staining, and their capability of 
automating the detection, in the case of fluorescence (Livak et al., 1995; 
Butler et al., 2001). The first capillary electrophoresis (CE) separations of 
short tandem repeat (STR) alleles were performed in late 1992 using 


nondenaturing conditions with the polymerase chain reaction (PCR) products 
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in a double-stranded form. Fluorescent intercalating dyes were used to 
visualize the DNA with laser-induced fluorescence detection and to promote 
the resolution of closely spaced alleles. Internal standards were used to 
bracket the alleles in order to perform accurate STR genotyping. An allelic 
ladder was first run with the internal standards to calibrate the DNA migration 
times followed by analysis of the samples with the same internal standards. 
Now, fluorescence-based detection assays are widely used in forensic 
laboratories due to their capabilities for multicolor analysis as well as rapid 
and easy-to use formats. In the application to DNA typing with STR markers, 
the fluorescent dye is attached to a PCR primer that is incorporated into the 
amplified target region of DNA. Amplified STR alleles are visualized as 
bands on a gel or represented by peaks on an electropherogram. A 
fluorescence detector is a photosensitive device that measures the light 
intensity emitted from a fluorophore. Detection of low-intensity light may be 
accomplished with a photomultiplier tube (PMT) or a charge-coupled device 
(CCD). The action of a photon striking the detector is converted to an electric 
signal. The strength of the resultant current is proportional to the intensity of 
the emitted light. This light intensity is typically reported in arbitrary units, 
such as relative fluorescence units (RFUs). Multi-component spectral analysis 
is performed by testing a standard set of DNA fragments labeled with each 
individual dye. Computer software provided with the CE instrument then 
analyzes the data from each of the dyes. Use of different colored fluorescent 
dyes has made it possible to analyze different STR loci simultaneously, each 
with its own color label (Gill et a/., 2001). The computer software enables the 
raw data to be converted into fragment size which is ultimately matched for 


number of STRs in DNA samples. 
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1.5 INDIAN POPULATION DIVERSITY 


India is a country with enormous social and cultural diversity due to its 
positioning on the crossroads of many historic and pre-historic human 
migrations. The hierarchical caste system in the Hindu society dominates the 
social structure of the Indian populations (Bhasin & Walter, 2001). The origin 
of the caste system in India is a matter of debate with many linguists and 
anthropologists suggesting that it began with the arrival of Indo-European 
speakers from Central Asia about 3500 years ago. Previous genetic studies 
based on Indian populations failed to achieve a consensus in this regard 
(Thanseem et al., 2006). Indian populations are classified into various caste, 
tribe and religious groups, which altogether makes them very unique 
compared to rest of the world. India is considered as a treasure for the 
geneticists and evolutionary scholars as it is conglomerated with 4,635 
anthropologically well-defined populations, among which 532 are tribes, 
including 72 primitive tribes (36 hunters and gatherers). They differ from each 
other with respect to their language, social structure, dress and food habits, 
marriage practices, physical appearance and genetic architecture. India 
harbours a variety of geographical realms that give refuge to diverse humans 
and a verity of microbes, plants and animals. In India, four major language 
families are spoken such as Indo-European, Dravidian, Austro-Asiatic, and 
Tibeto-Burman. In addition, India has enigmatic Andaman-Nicobar Islanders, 
whom we predicted as the descendants of early group of modern humans 
(Papiha, 1996; Thangaraj et al., 2003; Tamang & Thangaraj, 2003; Tamang et 
al., 2012). 
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Figure 1.3: Possible routes of modern human migrations to Indian 


subcontinent (Tamang et al., 2012) 


1.5.1 SOCIAL STRATIFICATION AMONG HINDU SOCIETY 

The caste system has persisted in Indian Hindu society for around 3,500 years. 
Like the Y chromosome, caste is defined at birth, and males cannot change 
their caste (Zerjal et al., 2007). The caste system was a typical feature of the 
Hindu society and it divided Hindus into four categories viz. Brahmins, 
Kshatriyas, Vaishyas and Sudras. Brahmins were primarily involved in 
teaching and performing rituals, Kshatriyas were rulers and defended the 
territory, Vaishyas were businessmen and the Sudras served as the labourers. 
Further, each caste is subdivided into subcastes and subcastes into multiple 
Gotras. The caste system became the governing factor of all socio-religious 


and economic activities of people. The tribes remained isolated from the other 
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groups and occupied relatively remote places. Several religious communities 
build up in mainland India during the course of time due to several waves of 
migrations from different directions. The rise of the majority of religious 


groups was basically due to cultural adaptations (Majumder, 1998). 


1.5.2 POPULATIONS OF ODISHA 

Odisha is located on the southeast coastal region of India. Endowed with 
nature’s bounty, a 482km stretch of coastline with virgin beaches, serpentine 
rivers, mighty waterfalls, forest-clad blue hills of Eastern Ghats with rich wild 
life Odisha is dotted with exquisite temples, historic monuments as well as 
pieces of modern engineering feat. Odisha, with a rich heritage that is more 
than two thousand years old, has a glorious history of its own. It was known 
under different names in different periods: Kalinga, Utkal, Odradesha or 
Orissa. Seaports flourished along the coast as early as the 4th and Sth 
centuries B.C., when the sadhabs, the Odishan seafaring merchants, went to 
the islands of Java, Sumatra, Borneo and Bali with their merchandise. Not 
only did they bring home wealth and prosperity, they also carried the glorious 
Indian civilizations with them and helped its spread abroad. Odisha has a 
population of about three crore (Census of India, 2011). Odisha is inhabited 
by various population groups belonging to different strata of the hierarchical 
caste system. The non-tribal populations of Odisha belong to four different 
castes like, Brahmins, Khandayats, Karans and Gope. Khandayats belong to 
an ancient warrior group also known as Kshatriya and constitute over 30% of 


the state’s population. 
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1.5.3 Y-CHROMOSOME BASED POPULATION STUDIES 

Y-chromosome markers (STRs and SNPs) are located in the non-recombining 
region of Y-chromosome (NRY) and can preserve paternal history. Very 
recently, there has been an increasing use of hundreds of thousands of 
autosomal SNPs to deduce population structure (Reich et al., 1991; Chaubey 
et al., 2011; Saha et al., 2003; Sahoo et al., 2006). Y-chromosome and 
mtDNA markers have been extensively used to infer peopling of different 
continents/countries and to trace the maternal and paternal lineages of 
different populations (Hammer, 1995; Bamshad ef a/. 2001; Kivisild et al., 
2003; Thangaraj et al., 2003; Bamshad & Wooding, 2003; Basu et al., 2003; 
Thanseem et al., 2006; Thangaraj et al., 2007). The most accepted model for 
human origin and migration is known as out-of-Africa’, which suggests origin 
of modern human in Africa and subsequent migration and expansion to 
different continents; through southern coastal route during 60,000 to 85,000 
ybp (Thangaraj et al., 2007). The southern coastal route hypothesis is based 
on a fact that a small group of modern human on crossing fertile crescent 
entered India followed by their entry to southeast Asia and subsequently 
(50,000 to 60,000 ybp) to Australia and rest of the world (Thangaraj et al., 
2003; Macaulay et al., 2005). Recently, the early peopling of Europe has been 
dated approximately 45,000 ybp and many more corrections on the previous 


dating have been put forward (Callaway, 2012). 


The DNA-based studies on Indian populations began during early 1990s. 
However, some of the initial studies dealt with populations, which are neither 
anthropologically well-defined nor were really representative Indian 
populations (Semino ef al., 1991; Passarino et al., 1992; Soodyall & Jenkins 
1992; Barnabas ef al., 1996). Mountain et al. (1995) were probably the first, 


who tried to deal with demographic history of India, based on sequencing of 
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the mitochondrial control (D-loop) region. In a study dealing with the 9 bp 
deletion located in the mitochondrial genome among a number of tribal and 
caste populations of southern India, Watkins et al., (1999) suggested multiple 
origin of 9 bp deletion in southern India, indicating the heterogeneity among 
the Indians. The traces of socio-cultural, linguistic physiographical boundaries 
and evolutionary forces leading to diversity are well documented in the recent 
studies. The most accepted and proven view on Indians is that peopling of 
India is very ancient along with recent gene flow from west and east Eurasia 
(Kivisild et al. 1999; Bamshad et al. 2001; Misra 2001; Basu et al. 2003; 
Thangaraj et al., 2003; Thangaraj et al., 2005; Underhill et al. 2010; Chaubey 
et al. 2008, 2011; Chandrasekar et al. 2009). The vast majority (> 98%) of the 
Indian maternal gene pool, consisting of Indio-European and Dravidian 
speakers, is genetically more or less uniform. Invasions after the late 
Pleistocene settlement might have been mostly male-mediated. However, Y- 
SNP data provides compelling genetic evidence for a tribal origin of the lower 
caste populations in the subcontinent. Lower caste groups might have 
originated with the hierarchical divisions that arose within the tribal groups 
with the spread of Neolithic agriculturalists, much earlier than the arrival of 
Aryan speakers. The Indo-Europeans established themselves as upper castes 
among this already developed caste-like class structure within the tribes 
(Thangaraj et al., 2006; Thangaraj et al., 2010; Sengupta et al., 2006; 
Eaaswarkhanth et al., 2010). In the last decade, several studies have also been 
carried out to study the origin of various Indian castes (Thangaraj et al., 2007; 
Frank et al., 2008; Mukherjee et al., 2009; Nair et a/., 2011; Khurana et al., 
2014). 
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2.1 HUMAN GENOME PROJECT 


DNA (deoxyribonucleic acid) in its present form was first discovered by 
Friedrich Miescher (1869). Its composition and bio-chemical nature were 
enunciated by Levene (1919). The double helical structure of DNA was 
discovered by Watson, Crick and Wilkins (1953). The 50° anniversary of 
DNA structure discovery was marked by successful completion of Human 
Genome Project (Lander et al., 2001; Venter et al., 2003). The Human 
Genome Project was a 13-year-long, publicly funded project initiated in 1990 
with the objective of determining the DNA sequence of the entire human 
genome. In its early days, the Human Genome Project was met with 
skepticism by many people, including scientists and nonscientists alike. One 
prominent question was whether the huge cost of the project would outweigh 
the potential benefits. Today, however, the overwhelming success of the 
Human Genome Project is readily apparent. Not only did the completion of 
this project usher in a new era in genomics, but it also led to significant 
advances in the types of technology used to sequence DNA (Waterson ef al., 
2003). Humans are identical over most of their genomes. Thus, only a 
relatively small number of genetic differences have resulted in the striking 
variation seen among individuals of our species. This phenotypic variation 
among humans was the subject of a recent study by Luis B. Barreiro and his 
colleagues at the Pasteur Institute in Paris (Barreiro ef al., 2008). In particular, 
Barreiro and his colleagues were interested in how natural selection has led to 


phenotypic differences. 
When we think of variation between people, we often think of differences in 


height, weight, and skin color. Each of these characteristics is only partially 


controlled by genes. The complex interaction between genes and the 
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environment, as well as between multiple genes, makes trying to understand 
and quantify human phenotypic variation difficult. Therefore, instead of 
looking at complex human traits, Barreiro and his colleagues went straight to 
the source and looked for nucleotide sequences in the genome that could tell 
them about individual human variation. For this study, the identification of 
single base changes (single nucleotide polymorphisms, or SNPs) was 
considered ideal. Barreiro and his colleagues obtained data for their research 
from the HapMap project, an international consortium that has built a vast and 
growing repository of human genetic variation. To date, the project has 
analyzed over 3.1 million SNPs across the human genome common to 270 
individuals of African, Asian, and European ancestry (International HapMap 


Consortium, 2003, 2005). 


A SNP is a variation of a single nucleotide between individuals. These 
polymorphisms can therefore be used to discern small differences both within 
a population and among different populations (Ramana et al., 2001). The 
beauty of SNPs is that the observed variation can be followed over time and 
quantified. If SNPs change either the function of a gene or its expression, and 
the change provides greater fitness for a population (i.e., a higher capacity to 
survive and/or reproduce in a given environment), the change will be favored 
by natural selection. Therefore, SNPs can be the basis of evolutionary change. 


This was the basic premise of Barreiro's study. 


Simple tandem-repetitive regions of DNA (or ‘minisatellites’) which are 
dispersed in the human genome frequently show substantial length 
polymorphism arising from unequal exchanges which alter the number of 
short tandem repeats in a minisatellite. The repeat elements in a subset of 


human minisatellites share a common 10-15base-pair (bp) ‘core’ sequence 
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which might act as a recombination signal in the generation of these 
hypervariable regions (Ludwig et al., 1989). A hybridization probe consisting 
of the core repeated in tandem can detect many highly polymorphic 
minisatellites simultaneously to provide a set of genetic markers of general 
use in human linkage analysis. Other variant probes can detect additional sets 
of hypervariable minisatellites to produce somatically stable DNA 
‘fingerprints’ which are completely specific to an individual (or to his or her 
identical twin) and can be applied directly to problems of human 
identification, including parenthood testing (Jeffreys et al., 1985; Jeffreys et 
al., 1986; Jeffreys et al., 1988). 


2.2 REPEATED DNA SEQUENCES & GENETIC MARKERS 


Since it has been estimated that over 99.7% of the human genome is the same 
from individual to individual, regions that differ need to be found in the 
remaining 0.3% in order to tell people apart at the genetic level. There are 
many repeated DNA sequences scattered throughout the human genome. As 
these repeat sequences are typically located between genes, they can vary in 
size from person to person without impacting the genetic health of the 


individual (Venter et al., 2003; Jobling, 2012). 


Human genomes are full of repeated DNA sequences (Ellegren, 2004). These 
repeated DNA sequences come in all sizes and are typically designated by the 
length of the core repeat unit and the number of contiguous repeat units or the 
overall length of the repeat region. Long repeat units may contain several 
hundred to several thousand bases in the core repeat. These regions are often 
referred to as satellite DNA and may be found surrounding the chromosomal 


centromere. The term satellite arose due to the fact that frequently one or more 
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minor “satellite bands” were seen in early experiments involving equilibrium 
density gradient centrifugation (Britten & Kohne, 1968; Primrose, 1998). The 
core repeat unit for a medium-length repeat, sometimes referred to as a 
minisatellite or a VNTR (variable number of tandem repeats), is in the range 
of approximately 8 base pairs (bp) to 100bp in length (Nakamura et al., 1987; 
Boerwinkle et al., 1989; Odelberg et al., 1989; Tautz, 1993). The most 
commonly used minisatellite marker in the 1990s was D1S80, which has a 
16bp repeat unit and contains alleles spanning the range of 14 to 41 repeat 
units (Kasai et al., 1990; Budowle ef al., 1991; Butler, 2010). DNA regions 
with repeat units that are 2bp to 7bp in length are called microsatellites, 
simple sequence repeats (SSRs), or most usually short tandem repeats (STRs). 
STRs have become popular DNA repeat markers because they are easily 
amplified by the polymerase chain reaction (PCR) without the problems of 
differential amplification (Horn et al., 1989; Kimpton et al., 1993; Kimpton et 
al., 1994; Kimpton et al., 1996). This is because both alleles from a 
heterozygous individual are similar in size since the repeat size is small. The 
number of repeats in STR markers can be highly variable among individuals, 
which makes these STRs effective for human identification purposes (Litt & 
Lutty, 1989). Literally thousands of polymorphic microsatellites have been 
characterized in human DNA and there may be more than a million 
microsatellite loci present depending on how they are counted (Ellegren, 
2004). Regardless, microsatellites account for approximately 3% of the total 
human genome (International Human Genome Sequencing Consortium, 
2001). STR markers are scattered throughout the genome and occur on 
average every 10,000 nucleotides (Edwards et al., 1991). However, not all 
STR loci exhibit variability between individuals. Computer searches of the 
recently available human genome reference sequence have cataloged the 


number and nature of STR markers in the genome (Gill, 2002). A large 
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number of STR markers have been characterized by academic and commercial 
laboratories for use in disease gene location studies (Broman et al., 1998; 
Ghebranious et al., 2003). To perform analysis on STR markers, the invariant 
flanking regions surrounding the repeats must be determined. Once the 
flanking sequences are known then PCR primers can be designed and the 
repeat region amplified for analysis. New STR markers are usually identified 
in one of two ways: (1) searching DNA sequence databases such as GenBank 
for regions with more than six or so contiguous repeat units (Weber & May, 
1989; Collins et al., 2003; Subramanian ef al., 2003); or (2) performing 
molecular biology isolation methods (Edwards et al., 1991; Chambers & 
MacAvoy, 2000). 


2.2.1 FORENSIC DNA TYPING 

For human identification purposes it is important to have DNA markers that 
exhibit the highest possible variation or a number of less polymorphic markers 
that can be combined in order to obtain the ability to discriminate between 
samples. Forensic specimens are often challenging to PCR amplify because 
the DNA in the samples may be severely degraded. Mixtures are prevalent as 
well in some forensic samples, such as those obtained from sexual assault 
cases containing biological material from both the perpetrator and victim 
(Griffiths et al., 1998; Gonzalez et al., 2001; Hanson & Ballantyne, 2007). 
The small size of STR alleles (2bp to 7bp) compared to minisatellite VNTR 
alleles (400bp to 1000bp) make the STR markers better candidates for use in 
forensic applications where degraded DNA is common (Bar et al., 1997). PCR 
amplification of degraded DNA samples can be better accomplished with 
smaller product sizes. These reduced-size STR amplicons are often referred to 
as miniSTRs. Allelic dropout of larger alleles in minisatellite markers caused 


by preferential amplification of the smaller allele is also a significant problem 
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with minisatellites. Furthermore, single-base resolution of DNA fragments can 
be obtained more easily with sizes below 500bp using high-resolution 
capillary electrophoresis. Thus, for both biology and technology reasons the 
smaller STRs are advantageous compared to the larger minisatellite (VNTRs). 
Among the various types of STR systems, tetranucleotide repeats have 
become more popular than di- or trinucleotides. Penta- and hexanucleotide 
repeats are less common in the human genome but are being examined by 
some laboratories (Hammond et al., 1994). A biological phenomenon known 
as “stutter” results when STR alleles are PCR amplified. Stutter products are 
amplicons that are typically one or more repeat units less in size than the true 
allele and arise during PCR because of strand slippage (Walsh et al., 1996). 
Stutter product amounts vary depending on the STR locus and even the length 
of the allele within the locus but are usually less than 15% of the allele 
product quantity with tetranucleotide repeats. With di- and trinucleotides, the 
stutter percentage can be much greater (30% or more) making it difficult to 
interpret sample mixtures. In addition, the four-base spread in alleles with 
tetranucleotides makes closely spaced heterozygotes easier to resolve with 
size-based electrophoretic separations compared to alleles that could be two or 
three bases different in size with dinucleotide and trinucleotide markers, 
respectively (Kirby, 1992; Pascali et al., 1998; Grignani ef al., 2000; Bieber et 
al., 2006). 


2.2.2 TYPES OF STR MARKERS 

STR repeat sequences are named by the length of the repeat unit. Dinucleotide 
repeats have two nucleotides repeated next to each other. Trinucleotides have 
three nucleotides in the repeat unit, tetranucleotides have four, 
pentanucleotides have five, and hexanucleotides have six nucleotides in the 


core repeat. However, because microsatellites are tandemly repeated, some 
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motifs are actually equivalent to others. STRs are often divided into several 
categories based on the repeat pattern. Simple repeats contain units of 
identical length and sequence, compound repeats comprise two or more 
adjacent simple repeats, and complex repeats may contain several repeat 
blocks of variable unit length as well as variable intervening sequences 
(Urquhart et al., 1994). Complex hypervariable repeats also exist with 
numerous non-consensus alleles that differ in both size and sequence and are 
therefore challenging to genotype reproducibly (Urquhart er al., 1993; Gill et 
al., 1994). This last category of STR markers is not as commonly used in 
forensic DNA typing due to difficulties with allele nomenclature and 
measurement variability between various laboratories, although several 
commercial kits now include the complex hypervariable STR locus SE33, 


sometimes called ACTBP2 (Urquhart et al., 1993). 


2.2.3 Y-CHROMOSOME MARKERS 

The Y-chromosome and mitochondrial DNA (mtDNA) markers are known as 
“lineage markers.” They are passed down from generation to generation 
without changing (except for mutational events). Maternal lineages can be 
traced with mitochondrial DNA sequence information while paternal lineages 
can be followed with Y-chromosome markers (Graves, 1995; Graves et al., 
1998; Bower, 2000, 2003; Brown, 2002). With lineage markers, the genetic 
information from each marker is referred to as a haplotype rather than a 
genotype because there is usually only a single allele per individual. Because 
Y-chromosome markers are linked on the same chromosome and are not 
shuffled with each generation, the statistical calculations for a random match 
probability cannot involve the product rule. Therefore, haplotypes obtained 
from lineage markers can never be as effective in differentiating between two 


individuals as genotypes from autosomal markers that are unlinked and 
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segregate separately from generation to generation (Jegalian & Lahn, 2001). 
However, Y-chromosome, mitochondrial DNA, and X-chromosome markers 
can play an important role in forensic investigations as well as other human 
identification applications (Lahn ef al., 2001; Gill et al., 2001). Y- 
chromosome analysis is also being utilized in anthropological investigation 
(human migration studies) and it can lead to accurate estimation of TMRCA 


(Poznik et al., 2013). 


A detailed analysis of the “finished” reference Y-chromosome sequence was 
described in the June 19, 2003 issue of Nature by researchers from the 
Whitehead Institute and Washington University. Although it is stated as being 
a “finished” sequence, Skaletsky et al., (2003) report on only 23Mb of the 
roughly S5OMb present in a typical human Y-Chromosome. The unreported 
and as yet unsequenced 30Mb portion is a heterochromatin region located on 
the long arm of the Y-chromosome that is not transcribed and is composed of 
highly repetitive sequences, which are impossible to sequence reliably with 
current technology. At 50Mb, the Y-chromosome is the third smallest human 
chromosome only slightly larger than chromosome 21 (47Mb) and 
chromosome 22 (49Mb). The tips of the Y-chromosome, which are called the 
pseudo-autosomal regions (PAR), recombine with their sister sex X- 
chromosome homologous regions. PARI located at the tip of the short arm 
(Yp) of the Y-chromosome is approximately 2.5Mb in length while PAR2 at 
the tip of the long arm (Yq) is less than 1Mb in size (Graves et al., 1998). The 
remainder of the Y-chromosome (95%) is known as the non-recombining 
portion of the Y-chromosome, or NRY. The NRY remains the same from 
father to son unless a mutation occurs. Some authors term the NRY the male- 
specific region (MSY) because of evidence of frequent gene conversion or 


intra-chromosomal recombination (Skaletsky ef al., 2003). A total of 156 
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known transcription units including 78 protein-coding genes are present on 
MSY. Many sequences in the Y-chromosome are highly duplicated either 
with themselves or with the X-chromosome. Three classes of sequences have 
been characterized in the Y-chromosome: X-transposed, X-degenerate, and 
ampliconic (Skaletsky et al., 2003). Two blocks on the short arm of Y- 
chromosome with a combined length of 3.4Mb make up the X-transposed 
sequences. These sequences are 99% identical to sequences found in Xq21, 
contain two coding genes, and do not participate in X—Y crossing over during 
male meiosis. X-degenerate segments of MSY occur in eight blocks on both 
the short arm and the long arm of the Y-chromosome with an aggregate length 
of 8.6Mb. These X-degenerate segments possess up to 96% nucleotide 
sequence identity to their X-linked homologues. These X-homologous regions 
can make it challenging to design Y-chromosome assays that generate male- 
specific DNA results. If portions of an X-homologous region of the Y- 
chromosome are examined inadvertently, then female DNA, which possesses 
two X-chromosomes, will be detected. Thus, when testing Y-chromosome- 
specific assays it is important to examine them in the presence of female DNA 
(high levels) to verify that there is little-to-no cross talk with X-homologous 
regions of the Y-chromosome (Butler ef al., 2002, Hall & Ballantyne, 2003). 
The ampliconic segments are composed of seven large blocks scattered across 
both the short arm and the long arm and covering about 10.2Mb of the Y- 
chromosome (Skaletsky ef al., 2003). Some 60% of these ampliconic 
sequences have intrachromosomal identities of 99.9% or greater. In other 
words, it is very difficult to tell these sequences apart from one another. 
Another interesting feature of these ampliconic segments is that many of them 
are palindromes-that is, the almost exact duplicate sequences are inverted with 
respect to each other’s sequence essentially as mirror images. Eight large 


palindromes collectively comprise 5.7Mb of Yq with at least six of these 
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palindromes containing testis genes. Genetic markers within these 
palindromic regions will exist as multi-copy PCR products from single primer 
sets. For example, the DAZ (deleted in azospermic) gene occurs in four copies 
at 24Mb along the reference sequence (Saxena et al., 1996, Skaletsky et al., 
2003; Prinz, 2003; Melissa et al., 2014). 


2.2.3.1 Minimal Haplotype Loci 

The number of Y-STR loci available for use in human identity testing has 
increased dramatically since the turn of the century and the availability of the 
human genome sequence. In the 1990s only a handful of Y-STR markers were 
characterized and available for use and only about 30 Y-STRs were available 
for researchers (Butler, 2003) at the beginning of 2002. These Y-STRs have 
been cataloged and mapped to their Y-chromosome positions (Hanson & 
Ballantyne, 2006). Yet even with a limited number of loci available at the 
time, a core set was selected in 1997 that continue to serve as “minimal 
haplotype” loci (Kayser ef al., 1997, Pascali et al., 1998). The minimal 
haplotype is defined by the single copy Y-STR loci DYS19, DYS389I, 
DYS389II, DYS390, DYS391, DYS392, DYS393, and the highly 
polymorphic multi-copy locus DYS385 a/b (Schneider et al., 1998). By means 
of a multicenter study, more than 4000 male DNA samples from 48 different 
subpopulation groups were studied with the single copy loci in the minimal 
haplotype set (de Knijff et al., 1997). This work formed the basis for what is 
now the online Y-STR Haplotype Reference Database (http://www.yhrd.org) 


that will be described in more detail below. 


In January 2003, the U.S. Scientific Working Group on DNA Analysis 
Methods (SWGDAM) recommended use of the minimal haplotype loci plus 
two additional single copy Y-STRs: DYS438 and DYS439 (Ayub et al., 
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2000). Information regarding these core loci and other loci present in 
commercial Y-STR kits may be found in Table. Although other Y-STRs may 
be added to databases as their value is demonstrated and they become part of 
commercially available kits, the original minimal haplotype loci and 
SWGDAM recommended Y-STRs are likely to dominate human identity 


applications in the coming years. 


2.2.3.2 Y-STR Nomenclature 

The DNA Commission of the International Society of Forensic Genetics 
(ISFG) has made a series of recommendations on the use of Y-STR markers 
(Carvalho-Silva et al., 1999; Gill et al., 2001, Gusmao et al., 2006). Their 
recommendations address allele nomenclature, use of allelic ladders, 
population genetics, and reporting methods. The ISFG recommendations for 
Y-STR allelic ladders include the following: (a) the alleles should span the 
distance of known allelic variants for a particular locus, (b) the rungs of the 
ladder should be one repeat unit apart wherever possible, (c) the alleles 
present in the ladder should be sequenced, and (d) the ladders should be 
widely available to enable reliable interlaboratory comparisons. The existence 
of commercially available Y-STR kits has now facilitated the widespread use 
of consistent allelic ladders. Prior to commercially available Y-STR kits and 
consistent allelic ladders, various researchers in the field took different 
approaches to naming alleles. For some loci there were instances of multiple 
published designations for the same allele. An example of this phenomenon 
that illustrates the importance of standardization is DYS439, which has been 
designated three different ways in the literature. In an effort to provide a 
unified nomenclature for STR loci, a comparative analysis of the repeat and 
sequence structure of Y-chromosome markers in humans and chimpanzees 


has been proposed and 11 human Y-STRs have been studied (Gusmao et al., 
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2002). Since the chimpanzees examined in their study did not vary in the other 
regions outside of the variable core GATA repeat for DYS439, Gusmao et al. 
(2002) proposed a [GATA]n repeat structure for humans. This nomenclature 


has now been adopted for all commercial STR kits typing DYS439. 


2.2.3.3 Y-STR Kits 

As noted in Chapter 5, forensic scientists rely heavily on commercially 
available kits to perform DNA testing. Thus, many laboratories especially in 
the U.S. were reluctant to move into Y-STR typing until Y-STR kits were 
offered. Two most widely used Y-STR kits are PowerPlex Y (Promega 
Corporation) and Yfiler (Applied Biosystems). All of the European and U.S. 
core Y-STR loci are included in both kits. PowerPlex Y contains one 
additional locus (DYS437) and Yfiler has six additional loci (DYS437, 
DYS448, DYS456, DYS458, DYS635, and GATA-H4). Until 2005, 
ReliaGene Technologies (formerly of New Orleans, LA) sold the Y-PLEX 12 
kit, which amplified the SWGDAM recommended loci plus the amelogenin 
marker. Reliagene had also supplied Y-PLEX 6 and Y-PLEX 5 kits (Sinha et 
al., 2004), which were precursors to the Y-PLEX 12 kit. Inclusion of 
amelogenin enables confirmation that the PCR reaction has not failed on 
female DNA samples since a single X amplicon will result. In addition, 
mixture levels of male and female DNA can be confirmed in many situations 
with the amelogenin X and Y peak height ratios. While the amelogenin 
primers provide a measure of quality control on PCR amplifications, they 
have the disadvantage of possibly tying up and consuming PCR reagents 


when high levels of female DNA are present in a mixture. 
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Table 2.1: Characteristics of Commonly Used Y-STR Loci (Butler, 2006; 
Decker et al., 2007). 


STR Position Repeat Allele Mutation 
Marker (Mb) Motif Range Rate 
DYS393 | 3.19 AGAT 8-17 0.10% 
DYS456 | 4.33 AGAT 13-18 0.42% 
DYS458 | 7.93 GAAA 14-20 0.64% 
DYS19 10.13 TAGA 10-19 0.23% 
DYS391 | 12.61 TCTA 6-14 0.26% 
DYS635__| 12.89 TSTA 17-27 0.35% 
DYS437 | 12.98 TCTR 13-17 0.12% 
DYS439 | 13.03 AGAT 8-15 0.52% 
DYS389 | 13.12 TCTR 9-17/24— | 0.25%/0.36% 
IAI 34 

DYS438 | 13.38 TITTC 6-14 0.03% 
DYS390 | 15.78 TCTR 17-28 0.21% 
Y-GATA- | 17.25 TAGA 8-13 0.24% 
H4 

DYS385 | 19.26 GAAA 7-28 0.21% 
a/b 

DYS392 | 21.04 TAT 6-20 0.04% 
DYS448 | 22.78 AGAGAT | 17-24 0.16% 


Y-chromosome DNA testing is important for a number of different 
applications of human genetics including forensic evidence examination, 
paternity testing, historical investigations, studying human migration patterns 


throughout history, and genealogical research. In terms of forensic 
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applications, there are both advantages and limitations to Y-chromosome 
testing. The primary value of the Y-chromosome in forensic DNA testing is 
that it is found only in males. The SRY (sex-determining region of the Y) 
gene determines maleness. Since a vast majority of crimes where DNA 
evidence is helpful, particularly sexual assaults, involve males as the 
perpetrators, DNA tests designed to only examine the male portion can be 
valuable. With Y-chromosome tests, interpretable results can be obtained in 
some cases where autosomal tests are limited by the evidence, such as high 
levels of female DNA in the presence of minor amounts of male DNA. These 
situations include sexual assault evidence from azospermic or vasectomized 
males and blood—blood or saliva—blood mixtures where the absence of sperm 
prevents a successful differential extraction for isolation of male DNA (Prinz 
& Sansone, 2001). In addition, the number of individuals involved in a “gang 
rape” may be easier to decipher with Y-chromosome results than with highly 
complicated autosomal STR mixtures. Using Y-chromosome-specific PCR 
primers can improve the chances of detecting low levels of the perpetrator’s 
DNA in a high background of a female victim’s DNA (Hall & Ballantyne 
2003). Y-chromosome tests have also been used to verify amelogenin Y- 


deficient males (Thangaraj et al., 2002). 


The same feature of the Y-chromosome that gives it an advantage in forensic 
testing, namely maleness, is also its biggest limitation. A majority of the Y- 
chromosome is transferred directly from father to son without recombination 
to shuffle its genes and provide greater genetic variety to future generations. 
Random mutations are the only mechanisms for variation over time between 
paternally related males. Thus, while exclusions in Y-chromosome DNA 
testing results can aid forensic investigations, a match between a suspect and 


evidence only means that the individual in question could have contributed the 
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forensic stain-as could a brother, father, son, uncle, paternal cousin, or even a 
distant cousin from his paternal lineage. Needless to say, inclusions with Y- 
chromosome testing are not as meaningful as autosomal STR matches from a 
random match probability point-of-view (de Knjiff 2003). On the other hand, 
the presence of relatives having the same Y-chromosome expands the number 
of possible reference samples in missing persons’ investigations and mass 
disaster victim identification efforts. Y-chromosome testing also aids familial 
searching (Dettlaff-Kakol & Pawlowski 2002, Sims ef al., 2008). Deficient 
paternity tests where the father is dead or unavailable for testing are benefited 
if Y-chromosome markers are used (Santos eft al., 1993). However, an 
autosomal DNA test is always preferred when possible since it provides a 
higher power of discrimination. The Y-chromosome has also become a 
popular tool for tracing historical human migration patterns through male 
lineages (Jobling & Tyler-Smith, 1995, 2003). Anthropological, historical, 
and genealogical questions can be answered through Y-chromosome results. 
For example, Y-chromosome results in 1998 linked modern-day descendants 
of Thomas Jefferson and Eston Hemings leading to the controversial 


conclusion that Jefferson fathered the slave (Foster et al., 1998). 


2.2.3.4 Y-STR Haplotype Databases 

A number of online Y-STR databases exist. The forensic databases contain 
collections of anonymous individuals and can be used to estimate the 
frequency of specified Y-STR haplotypes. The genetic genealogy databases, 
such as Y-search and Y-base, contain Y-STR haplotype information gathered 
by genetic genealogy companies with different sets of loci from males trying 
to make genealogical connections. Thus, the haplotypes in these genealogy 


databases are associated with specific individuals and family names. 
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YHRD 

The largest and most widely used forensic and general population genetics Y- 
STR database, known as the Y-STR Haplotype Reference Database (YHRD), 
was created by Lutz Roewer and colleagues at Humbolt University in Berlin, 
Germany, and has been available online since 2000 (Roewer, 2003; Willuweit 
& Roewer, 2007). As of 2014, YHRD contains results from more than 1, 89, 
000 samples with minimal haplotype loci results representing 710 different 
groups of sample submissions from various populations and countries around 
the world. Searches on YHRD may be conducted by population group or 


geographic location. 
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2.3 FORENSIC & POPULATION GENETICS STUDIES 
Y-STR polymorphisms were first discovered in 1980s (Tautz, 1989; 
Malaspina et al. 1990). Some researchers described that, STR 
polymorphisms appear to occur less frequent on the Y chromosome compared 
with autosomes (Spurdle and Jenkins 1992). 

Roewer et al., (1992) discussed one tetrameric simple repeat polymorphism 
mapped to Yp (DYS19). Three dimeric Y-STR loci (YCAI, YCAII, YCATID) 
polymorphism have been described Mathias et al., (1994). These STRs show 
moderate levels of polymorphism and are used for routine forensic as well as 
for anthropological applications (Roewer & Epplen, 1992; Roewer ef al., 
1993; Gomolka et al., 1994; Mathias et al., 1992). 


Roewer et al., (1996) utilized Y-chromosomal STR polymorphisms for male 


identification. 


A multicenter study was carried out to characterize 13 polymorphic short 
tandem repeat (STR) systems located on the male specific part of the human 
Y chromosome (DYS19, DYS288, DYS385, DYS388, DYS389I/II, DYS390, 
DYS391, DYS392, DYS393, YCAI, YCAIL, YCATI, DXYSI56Y) (Kayser et 
al., 1997). Amplification parameters and electrophoresis protocols including 
multiplex approaches were compiled. The typing of non-recombining Y loci 
with uni-parental inheritance requires special attention to population sub- 
structuring due to prevalent male lineages. To assess the extent of these sub- 
heterogeneities up to 3825 unrelated males were typed in up to 48 population 
samples for the respective loci. A consistent repeat based nomenclature for 
most of the loci has been introduced. They estimated the average mutation 


rate for DYS19 in 626 confirmed father-son pairs as 3.2 x 10 (95% 
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confidence interval limits of 0.00041-0.00677), a value which can also be 
expected for other Y-STR loci with similar repeat — structure. 
Recommendations are given for the forensic application of a basic set of 7 
STRs (DYS19, DYS3891, DYS389II, DYS390, DYS391, DYS392, and 
DYS393) for standard Y-haplotyping in forensic and paternity casework. 
They further recommend the inclusion of the highly polymorphic bilocal Y- 
STRs DYS385, YCAI, YCATII for a nearly complete individualization of 


almost any given unrelated male individual. 


To facilitate evolutionary and forensic studies of DNA polymorphisms on the 
Y chromosome, multiplex DNA typing technique was devised for four 
tetranucleotide STR loci (DYS19, DYS390, DYS391, and DYS393) (Redd et 
al., 1997). These Y-STR loci were simultaneously amplified with FAM- 
labeled primers and genotypes were determined with an automated DNA 
sequencer. They typed 162 males from three U.S. populations (African- 
Americans, European-Americans and Hispanics) and found that the haplotype 
diversities range from 0.920 to 0.969. This quadruplex system provides a 
facile means of genotyping these Y chromosome STRs, and should be useful 


in population genetic and forensic applications. 


Seven novel microsatellite markers were developed by White ef al., (1999). 
These microsatellites are tetranucleotide GATA repeats and are polymorphic 
among unrelated individuals. Five of the seven markers were male-specific, 
with no PCR product being generated from female DNA. The remaining 
markers were polymorphic in both males and females with many shared 


alleles between the sexes. 
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Underhill et al., (2000) carried out a study on Y-chromosome sequence 
variation and the history of human populations. The study was comprised of 
binary polymorphisms associated with the non-recombining region of the 
human Y chromosome (NRY), which preserves the paternal genetic legacy of 
our species that has persisted to the present, permitting inference of human 
evolution, population affinity and demographic history. They used denaturing 
high-performance liquid chromatography (DHPLC) to identify 160 of the 166 
bi-allelic and 1 tri-allelic site that formed a parsimonious genealogy of 116 
haplotypes, several of which display distinct population affinities based on the 
analysis of 1062 globally representative individuals. Results of the study 
suggested that, a minority of contemporary East Africans and Khoisan 
represent the descendants of the most ancestral patrilineages of anatomically 


modern humans that left Africa between 35,000 and 89,000 years ago. 


1.33 Mb of sequence from the human Y chromosome was analyzed for tri- to 
hexanucleotide microsatellites (Ayub et al., 2000). Twenty loci containing a 
stretch of eight or more repeat units with complete repeat sequence 
homogeneity were found, 18 of which were novel. Six loci (one tri-, four 
tetra- and one pentanucleotide) were assembled into a single multiplex 
reaction and their degree of polymorphism was investigated in a sample of 
278 males from Pakistan. Diversities of the individual loci ranged from 0.064 
to 0.727 in Pakistan, while the haplotype diversity was 0.971. One population, 
the Hazara, showed particularly low diversity, with predominantly two 


haplotypes. 


The reference database of highly informative Y-chromosomal short tandem 
repeat (STR) haplotypes (YHRD) was devised by Roewer et al., (2001). By 
September 2014, YHRD contained 136,184 9-locus ("minimal haplotypes"), 
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40% of which have been extended further to include two additional loci. 
Establishment of YHRD has been facilitated by the joint efforts of various 
forensic and anthropological institutions. 

Kayser and Sajantila (2001) studied mutations at Y-STR loci and its 
implications for paternity testing and forensic analysis. Knowledge about 
mutation rates and the mutational process of Y-chromosomal short-tandem- 
repeat (STR) or microsatellite loci used in paternity testing and forensic 
analysis is crucial for the correct interpretation of resulting genetic profiles. 
They analyzed a total of 4999 male germline transmissions from father/son 
pairs of confirmed paternity (99.9%) at 15 Y-STR loci. They identified 14 
mutations. Locus specific mutation rate estimates varied between 0 and 8.58 x 
10‘, and the overall average mutation rate estimate was 2.80 x 10. In two 
confirmed father/son pairs, mutations at two Y-STRs were observed. The 
probability of two mutations occurring within the same single germline 
transmission was estimated to be statistically not unexpected. Additional 
alleles caused by insertion polymorphisms were found at a number of Y-STRs 
and a frequency of 0.12% was estimated for DYS19. The observed mutational 
features for Y-STRs have important consequences for forensic applications 
such as the definition of criteria for exclusions in paternity testing and the 


interpretation of genetic profiles in stain analysis. 


DNA Commission of the International Society of Forensic Genetics (Gill et 
al., 2001) published a series of documents providing guidelines and 
recommendations concerning the application of DNA polymorphisms to the 
problems of human identification. This report addressed a relatively new area 
- namely, Y-chromosome polymorphisms, with particular emphasis on short 
tandem repeats (STRs) including nomenclature, use of allelic ladders, 


population genetics and reporting methods. 
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In the field of molecular diagnosis, forensic casework analysis is one of the 
most demanding investigations, due to its social impact. Optimization of DNA 
typing multiplex reactions with identical cycling conditions as those required 
by autosomal short tandem repeats (STR) multiplex reduces errors, and saves 
time and reagents. Corach et al. (2001) started Y-STR typing in routine 
forensic casework. They validated a five Y-STRs set for a multiplex PCR 
reaction (a triplex for DYSI19, DYS390 and DYS391 and a duplex for 
DYS392 and DYS393). Statistical attributes of the haplotypes of the five Y- 
STR investigated were evaluated in unrelated males from different 


metropolitan areas of Argentina. 


Reliable amplification of short tandem repeat (STR) DNA markers with the 
polymerase chain reaction (PCR) is dependent on high quality PCR primers. 
The particular primer combinations and concentrations are especially 
important with multiplex amplification reactions where multiple STR loci are 
simultaneously copied. Commercially available kits are now widely used for 
STR amplification and subsequent DNA typing. They presented the use of 
high performance liquid chromatography (HPLC) and time-of-flight mass 
spectrometry (TOF-MS) methods for characterization of commercially 
available STR kits. Butler et al., (2001) conducted a series of quality control 


test of PCR primers used in multiplex STR amplification reactions. 


Copying multiple regions of a DNA molecule is routinely performed today 
using the polymerase chain reaction (PCR) in a process commonly referred to 
as multiplex PCR. The development of a multiplex PCR reaction involves 
designing primer sets and examining various combinations of those primer 
sets and different reaction components and/or thermal cycling conditions. The 


process of optimizing a multiplex PCR reaction in order to obtain a well- 
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balanced set of amplicons can be time-consuming and labor-intensive. The 
rapid separation and quantification capabilities of capillary electrophoresis 
make it an efficient technique to help in the multiplex PCR optimization 
process. Butler et al., (2001) utilized capillary electrophoresis as a tool for 


optimization of multiplex PCR reactions 


Nineteen Y-specific short tandem repeat (STR) loci have been amplified in 
768 samples from the Iberian Peninsula in order to evaluate their usefulness in 
forensic casework (Bosch ef al., 2002) in three multiplex reactions. Two 
previously published multiplex reactions by Thomas et al., (1999) included 
six Y-STR loci (DYS19, DYS388, DYS390, DYS391, DYS392 and DYS393) 
and by six Y-STR loci (DYS434, DYS435, DYS436, DYS437, DYS438 and 
DYS439) by Ayub et al., (2000). Bosch et al., reported another seven loci 
(DYS385, DYS389, DYS460, DYS461, DYS462 and amelogenin) for this 
study. 


Redd et al. (2002) identified and characterized 14 novel short-tandem-repeats 
(STRs) on the Y chromosome and typed them in two samples, a globally 
diverse panel of 73 cell lines, and 148 individuals from a European—American 
population for forensic purposes. The analyzed Y-STRs include eight 
tetranucleotide repeats (DYS449, DYS453, DYS454, DYS455, DYS456, 
DYS458, DYS459, and DYS464), five pentanucleotide repeats (DYS446, 
DYS447, DYS450, DYS452, and DYS463), and one hexanucleotide repeat 
(DYS448). Sequence data were obtained to designate a repeat number 
nomenclature. The gene diversities of an additional 22 Y-STRs, including the 
most commonly used in forensic databases, were directly compared in the cell 


line DNAs. 
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A multiplex polymerase chain reaction (PCR) assay capable of simultaneously 
amplifying 20 Y chromosome short tandem repeat (STR) markers has been 
developed to aid human identity testing and male population studies by Butler 
et al., (2002). These markers include all of the Y STRs that make up the 
"extended haplotype" used in Europe (DYS19, DYS385, DYS389I/II, 
DYS390, DYS391, DYS392, DYS393, and YCAII) plus additional 
polymorphic Y STRs (DYS437, DYS438, DYS439, DYS447, DYS448, 
DYS388, DYS426, GATA A7.1, and GATA H4). 


A Y-chromosome multiplex polymerase chain reaction (PCR) amplification 
kit, known as Y-PLEX 6, was developed for use in human identification by 
Sinha et al., (2003). The Y-PLEX 6 kit enabled simultaneous amplification of 
six polymorphic short tandem repeat (STR) loci located on the non- 
recombinant region of the human Y-chromosome (DYS393, DYS19, 
DYS38911, DYS390, DYS391, and DYS385). Schoske et al., (2003) 
designed multiplex PCR for the simultaneous amplification of 10 Y- 


chromosome short tandem repeat (STR) loci. 


Two multiplex reactions were developed to amplify 16 Y-STRs (DYS19, 
DYS385, DYS389 I and II, DYS390, DYS391, DYS392, DYS393, DYS437, 
DYS438, DYS439, GATA A7.1, GATA A7.2, GATA A10, GATA C4, 
GATA H4&) (Beleza et al., 2003). 


Two tribal groups from southern India (Chenchus and Koyas) were analyzed 
for variation in mitochondrial DNA (mtDNA), the Y chromosome, and one 
autosomal locus and were compared with six caste groups from different parts 
of India, as well as with western and central Asians (Kivisild et al., 2003). In 


mtDNA phylogenetic analyses, the Chenchus and Koyas coalesce at Indian- 
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specific branches of haplogroups M and N that cover populations of different 
social rank from all over the subcontinent. Coalescence times suggest early 
late Pleistocene settlement of southern Asia and suggest that there has not 
been total replacement of these settlers by later migrations. They found H, L, 
and R2 are the major Indian Y-chromosomal haplo-groups that occur both in 
castes and in tribal populations and are rarely found outside the subcontinent. 
Haplo-group Rla, previously associated with the putative Indo-Aryan 
invasion, was found at its highest frequency in Punjab but also at a relatively 
high frequency (26%) in the Chenchu tribe. This finding, together with the 
higher Rla-associated short tandem repeat diversity in India and Iran 
compared with Europe and central Asia, suggests that southern and western 
Asia might be the source of this haplogroup. Haplotype frequencies of the 
MX1 locus of chromosome 21 distinguish Koyas and Chenchus, along with 
Indian caste groups, from European and eastern Asian populations. Taken 
together, these results show that Indian tribal and caste populations derive 
largely from the same genetic heritage of Pleistocene southern and western 
Asians and have received limited gene flow from external regions since the 
Holocene. The phylogeography of the primal mtDNA and Y-chromosome 
founders suggested that the southern Asian Pleistocene coastal settlers from 
Africa would have provided the inocula for the subsequent differentiation of 


the distinctive eastern and western Eurasian gene pools. 


Basu ef al., (2003) analyzed 58 DNA markers (mitochondrial [mt], Y- 
chromosomal, and autosomal) and sequence data of the mtHVS1 from a large 
number of ethnically diverse populations of India in order to study the 
peopling structure. The resulting genomic evidence suggested that (1) there 
was an underlying unity of female lineages in India, indicating that the initial 


number of female settlers may have been small; (2) the tribal and the caste 
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populations were highly differentiated; (3) the Austro-Asiatic tribals were the 
earliest settlers in India, providing support to one anthropological hypothesis 
while refuting some others; (4) a major wave of humans entered India through 
the northeast; (5) the Tibeto-Burman tribals share considerable genetic 
commonalities with the Austro-Asiatic tribals, supporting the hypothesis that 
they might have shared a common habitat in southern China; (6) the Dravidian 
tribals were possibly widespread throughout India before the arrival of the 
Indo-European-speaking nomads, but retreated to southern India to avoid 
dominance; (7) formation of populations by fission that resulted in founder 
and drift effects have left their imprints on the genetic structures of 
contemporary populations; (8) the upper castes showed closer genetic 
affinities with Central Asian populations, although those of southern India are 
more distant than those of northern India; (9) historical gene flow into India 
has contributed to a considerable obliteration of genetic histories of 


contemporary populations. 


A study of three different Y-specific microsatellites (Y-STRs) in the 
populations from Uttar Pradesh (UP), Bihar (BI), Punjab (PUNJ), and Bengal 
(WB), speaking modern indic dialects with its roots in Indo-Aryan language, 
and from South of India (SI), speaking the South Indian languages with their 
root in Dravidian language, had shown that the predominant alleles observed 
represent the whole range of allelic variation reported in different population 
groups globally. The results indicated that the Indian population is most 
diverse. The study demonstrated that the population groups, housed in eight 
states of the country in different geographic locations, broadly correspond 
with Indo-Aryan and Dravidian language families. Further, analyses based on 


haplotype frequency of different marker loci and gene diversity revealed that 
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none of the population groups had remained isolated from others. High levels 


of haplotype diversity exist in all the clusters of population (Saha et al., 2003). 


Das et al., (2004) studied Y-chromosome STR haplotypes among five 
endogamous population groups from western and southwestern India in an 
attempt to address the issue of genetic variation and the pattern of male gene 
flow. They studied 221 males at three Y-chromosome biallelic loci and 184 
males for the five Y-chromosome STRs. They observed 111 Y-chromosome 
STR haplotypes. An analysis of molecular variance (AMOVA) based on Y- 
chromosome STRs showed that the variation observed between the population 
groups belonging to two major regions (western and southwestern India) was 
0.17%, which was significantly lower than the level of genetic variance 
among the five populations (0.59%) considered as a single group. Combined 
haplotype analysis of the five STRs and the biallelic locus 92R7 revealed 
minimal sharing of haplotypes among these five ethnic groups, irrespective of 
the similar origin of the linguistic and geographic affiliations; this minimal 
sharing indicates restricted male gene flow. As a consequence, most of the 
haplotypes were population specific. Network analysis showed that the 
haplotypes, which were shared between the populations, seem to have 
originated from different mutational pathways at different loci. Biallelic 
markers showed that all five ethnic groups have a similar ancestral origin 


despite their geographic and linguistic diversity. 


Understanding the genetic origins and demographic history of Indian 
populations is important both for questions concerning the early settlement of 
Eurasia and more recent events, including the appearance of Indo-Aryan 
languages and settled agriculture in the subcontinent. Although there is 


general agreement that Indian caste and tribal populations share a common 
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late Pleistocene maternal ancestry in India, some studies of the Y- 
chromosome markers have suggested a recent, substantial incursion from 
Central or West Eurasia. To investigate the origin of paternal lineages of 
Indian populations, 936 Y chromosomes, representing 32 tribal and 45 caste 
groups from all four major linguistic groups of India, were analyzed for 38 
single-nucleotide polymorphic markers. Phylogeography of the major Y- 
chromosomal haplogroups in India, genetic distance, and admixture analyses 
all indicate that the recent external contribution to Dravidian- and Hindi- 
speaking caste groups has been low. The sharing of some Y-chromosomal 
haplogroups between Indian and Central Asian populations is most 
parsimoniously explained by a deep, common ancestry between the two 
regions, with diffusion of some Indian-specific lineages northward. The Y- 
chromosomal data consistently suggest a largely South Asian origin for Indian 
caste communities and therefore argue against any major influx, from regions 
north and west of India, of people associated either with the development of 
agriculture or the spread of the Indo-Aryan language family. The dyadic Y- 
chromosome composition of Tibeto-Burman speakers of India, however, can 
be attributed to a recent demographic process, which appears to have absorbed 
and overlain populations who previously spoke Austro-Asiatic languages 


(Sahoo et al., 2006). 


In order to investigate the genetic consequences of Indian caste system, Zerjal 
et al., (2007) analyzed male-lineage variation in a sample of 227 Indian men 
of known caste, 141 from the Jaunpur district of Uttar Pradesh and 86 from 
the rest of India. They typed 131 Y-chromosomal binary markers and 16 
microsatellites. They found striking evidence for male substructure: in 
particular, Brahmins and Kshatriyas (but not other castes) from Jaunpur each 


show low diversity and the predominance of a single distinct cluster of 
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haplotypes. Their findings confirmed the genetic isolation and drift within the 
Jaunpur upper castes, which may have resulted from founder effects and 


social factors. 


Thangaraj et al., (2007) studied two tribal populations (Halakki and Kunabhi) 
of coastal Uttar Kannada district of Karnataka, with their informed written 
consent. Both the populations are endogamous and they belong to the 
Dravidian linguistic family. Genomic variation was assayed in 171 individuals 
by resequencing approximately 75kb of DNA for an extensive and 
comprehensive study of genetic diversity in 12 genes of the innate immune 
system (Bairagya et al., 2008). Premi et al. (2009) analyzed unique signatures 
of natural background radiation on human Y chromosomes from Kerala, 


India. 


A study was undertaken to determine the extent of diversity at 12 
microsatellite short tandem repeat (STR) loci in seven primitive tribal 
populations of India with diverse linguistic and geographic backgrounds 
(Mukherjee et al., 2009). DNA samples of 160 unrelated individuals were 
analyzed for 12 STR loci by multiplex polymerase chain reaction (PCR). 
Gene diversity analysis suggested that the average heterozygosity was 
uniformly high (>0.7) in these groups and varied from 0.705 to 0.794. The 
Hardy-Weinberg equilibrium analysis revealed that these populations were in 
genetic equilibrium at almost all the loci. The overall G(ST) value was high 
(G(ST) = 0.051; range between 0.026 and 0.098 among the loci), reflecting 
the degree of differentiation/heterogeneity of seven populations studied for 
these loci. The cluster analysis and multidimensional scaling of genetic 


distances reveal two broad clusters of populations, besides Moolu Kurumba 


51 


Literature Review 


maintaining their distinct genetic identity vis-a-vis other populations. The 
genetic affinity for the three tribes of the Indo-European family could be 
explained based on geography and Language but not for the four Dravidian 


tribes. 


A total of 3046 males of Chinese, Malay, Thai, Japanese, and Indian 
population affinity were typed for the Y STR loci DYS19, DYS385 (counted 
as two loci), DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, 
DYS437, DYS438, DYS439, DYS456, DYS458, DYS635, DYS448, and Y 
GATA H4 using the AmpFISTR Yfiler kit by Budowle et al. (2009) in order 
to assess the effects of Asian population substructure on Y STR forensic 
analyses. These samples were assessed for population genetic parameters that 
impact forensic statistical calculations. All population samples were highly 
polymorphic for the 16 Y STR markers with the marker DYS385 being the 
most polymorphic, because it is comprised of two loci. Most (2677 out of a 
total of 2806 distinct haplotypes) of the 16 marker haplotypes observed in the 
sample populations were represented only once in the data set. Haplotype 
diversities were greater than 99.57% for the Chinese, Malay, Thai, Japanese, 


and Indian sample populations. 


Giroti et al., (2010) genotyped 48 population samples of Malani individuals 
(Himachal Pradesh, India) for 15 highly polymorphic autosomal STR loci and 
7 Y-STR loci. Balamurugan et al. (2010) analyzed population sample of 154 
unrelated male individuals for Y chromosome STR allelic and haplotype 


diversity in five ethnic Tamil populations from Tamil Nadu, India. 


Nonrecombining Y-chromosomal microsatellites (Y-STRs) are widely used to 
infer population histories, discover genealogical relationships, and identify 


males for criminal justice purposes. Although a key requirement for their 
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application is reliable mutability knowledge, empirical data are only available 
for a small number of Y-STRs thus far. Ballantyne er al., (2010) analyzed 
nearly 2000 DNA-confirmed father-son pairs, covering an overall number of 
352,999 meiotic transfers. Following confirmation by DNA _ sequence 
analysis, the retrieved mutation data were modeled via a Bayesian approach. 
With the 924 mutations at 120 Y-STR markers, a non-significant excess of 
repeat losses versus gains (1.16:1), as well as a strong and significant excess 
of single-repeat versus multirepeat changes (25.23:1), was observed. Although 
the total repeat number influenced Y-STR locus mutability most strongly, 
repeat complexity, the length in base pairs of the repeated motif, and the 


father's age also contributed to Y-STR mutability. 


A forensic Y-STR database generated in the US was compiled with profiles 
containing a portion or complete typing of 16 STR markers DYS19, DYS385, 
DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, 
DYS438, DYS439, DYS456, DYS458, DYS635, DYS448, and Y GATA H4 
(Ge et al., 2010). 


Linguistic and ethnic diversity throughout the Himalayas suggests that this 
mountain range played an important role in shaping the genetic landscapes of 
the region. Gayden et al., (2011) analyzed 17 Y-chromosomal short tandem 
repeat (Y-STR) loci among unrelated males from three Nepalese populations 
(Tamang, Newar, and Kathmandu) and a general collection from Tibet. The 
latter displays the highest haplotype diversity (0.9990) followed by 
Kathmandu (0.9977), Newar (0.9570), and Tamang (0.9545). The overall 
haplotype diversity for the Himalayan populations at 17 Y-STR loci was 
0.9973, and the corresponding values for the extended (11 loci) and minimal 


(nine loci) haplotypes were 0.9955 and 0.9942, respectively. No Y-STR 
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profiles are shared across the four Himalayan collections at the 17-, 11-, and 
nine-locus resolutions considered, indicating a lack of recent gene flow among 
them. Phylogenetic analyses support our previous findings that Kathmandu, 
and to some extent Newar, received significant genetic influence from India 
while Tamang and Tibet exhibit limited or no gene flow from the 


subcontinent. 


Nair et al., (2011) analyzed 8 short tandem repeat (STR) loci on the Y 
chromosome to analyze the haplotype of the Ezhava population of Kerala, 


south India and to trace the paternal genetic lineage of the population. 


Yadav et al., (2011) analyzed 17 Y-specific STR loci (DYS19, DYS389I, 
DS389II, DYS390, DYS391, DYS392, DYS393, DYS385a/b, DYS437, 
DYS438, DYS439, DYS448, DYS456, DYS458, DYS635 and 
Y_GATA_H4) in 181 unrelated male individuals in the Saraswat Brahmin 
population from three North Indian states. A total of 157 different 17-loci 
haplotypes were identified, 145 of which were unique. The most frequent 
haplotype was detected in nine instances, occurring with a frequency of 


4.97%. 


Regueiro et al., (2012) have analyzed ancestral modal Y-STR haplotype 
shared among Romani and South Indian populations. 161 Y-chromosomes 


from Roma, residing in two different provinces of Serbia, were analyzed. 


Parvathy et al., (2012) had analyzed haplotype data of 17 YSTR markers in 
Kerala nontribal populations. Chennakrishnaiah ef al., (2013) analyzed 
indigenous and foreign Y-chromosomes characterize among the Lingayat and 
Vokkaliga populations of Southwest India. Mukerjee ef al., (2013) studied 
differential pattern of genetic variability at the DXYS156 locus on 
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homologous regions of X and Y chromosomes in Indian population and its 


forensic implications. 


Wei et al., (2013) have conducted research on calibrated human Y- 
chromosomal phylogeny based on resequencing. They had identified variants 
present in high-coverage complete sequences of 36 diverse human Y 
chromosomes from Africa, Europe, South Asia, East Asia, and the Americas, 


representing eight major haplogroups. 


Perveen et al., (2014) studied Y-STR haplotype diversity in Punjabi 


population of Pakistan. 


Khurana et al., (2014) have analyzed Y Chromosome Haplogroup 
Distribution in Indo-European Speaking Tribes of Gujarat, Western India. The 
study was carried out in the Indo-European speaking tribal population groups 
of Southern Gujarat, India to investigate and reconstruct their paternal 
population structure and population histories. The role of language, ethnicity 
and geography in determining the observed pattern of Y haplogroup clustering 
in the study populations was also examined. A set of 48 bi-allelic markers on 
the non-recombining region of Y chromosome (NRY) were analyzed in 284 
males; representing nine Indo-European speaking tribal populations. The 
phylogenetic analysis revealed 13 paternal lineages, of which six haplogroups: 
C5, Hla*, H2, J2, Rlal* and R2 accounted for a major portion of the Y 
chromosome diversity. The higher frequency of the six haplogroups and the 
pattern of clustering in the populations indicated overlapping of haplogroups 


with West and Central Asian populations. 
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After the discovery of DNA structure and successful completion of Human 
Genome Project, the major challenge for scientists is to decipher Human 
Genome Variation. Indian population is an amalgamation of various ethnic, 
cultural and geographical groups. DNA marker-based studies on Indian 
population have revealed the presence of large extent of genetic variation 
among various populations. Although data on various Indian populations have 
been reported, there are no published data available about the genetic structure 
of the Khandayat population of Odisha elucidating the haplotype diversity 
based on 17 Y-STR loci. Therefore, this research was designed to understand 
the genetic diversity of 17 Y-STR loci of Khandayats and compare them with 


Non-Khandayats of Odisha and random Indian population. 
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3.1 OBJECTIVES 
1. To study polymorphism of Y-STRs in the Khandayat and non- 
Khandayat population of Odisha and to find out haplotype diversity at 

17 Y-STR loci. 
2. To compare the haplotype diversity at 17 Y-STR loci of Khandayat 
and non-Khandayat population of Odisha and random Indian 


population via statistical analysis for genetic relatedness. 


3.2 HYPOTHESES 
1. Similarities in the haplotype diversity pattern of Y-STRs of the 
Khandayat and non-Khandayat population of Odisha may be observed. 
2. Khandayat population may be genetically related to some of the 


studied random Indian populations. 
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3.3 SAMPLE ANALYSIS 


The samples were analyzed through the following steps. 


e Sample collection & preservation 
e DNA extraction 

e DNA Quantity & Quality check 
e Multiplex PCR 

e Genotyping 


e Statistical Analysis 


3.3.1 SAMPLE COLLECTION & PRESERVATION 


Whole blood samples (2ml) were collected using standard procedure in EDTA 


vacutainers (BD Biosciences, NJ, USA) from 300 healthy unrelated males of 


Odisha, India (150 samples from Khandayats and 150 samples from non- 


Khandayats) along with proper consent approved by Ethical Committee and 


stored at 4°C till further analysis. 


3.3.2 DNA EXTRACTION 


Genomic DNA was isolated from by standard Organic metod (Phenol - 


Chloroform extraction method) (Samrook ef al., 2001). 


DNA Extraction Protocol 


1. 


Add 1 ml of Lysis Buffer-I to Iml. of blood and mix properly 


2. Incubate at -80°C for 2hrs. 
3 
4 
5 


Shift the centrifuge tube to 65°C and keep it for 10 min. 


. Centrifuge for 15min at 4600rpm, 4°C. 
. Discard supernatant. Add |ml. Lysis Buffer-II to the pellet and mix 


properly. 
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6. Then add 2% SDS & Proteinase K. Mix properly by mild tapping. 

7. Incubate at 37°C overnight in water-bath. 

8. Cool to room temperature. Add Iml. of Phenol. Mix for 15minutes by 
inverting the tube. 

9. Centrifuge for 1Sminutes at 4600rpm. 

10. Discard organic phase. Shift the aqueous phase in another tube. 

11. Add 1Iml. of Phenol: Chloroform (1:1) to the aqueous phase. Mix for 
15minutes. 

12. Centrifuge for 1Sminutes at 4600 rpm. 

13. Discard organic phase. Shift the aqueous phase in another tube. 

14. Add Iml. of Chloroform: Iso-amyl alcohol (24:1) to the aqueous 
phase. Mix for 15minutes. 

15. Centrifuge for 1Sminutes at 4600rpm 

16. Discard organic phase. Shift the aqueous phase in another tube. 

17. To aqueous phase, add Iml. chilled Propanol & 0.1ml. of Sodium 
Acetate solution (3M). 

18. Precipitate the DNA by mixing properly. Centrifuge for 1min 
(Popspin). 

19. Wash DNA pellet by 70% ethanol. Centrifuge (Popspin). 

20. Dry the DNA pellet at room temperature & dissolve DNA in TE 
buffer. 


Reagents Used 

¢ Lysis buffer: I & II 

¢ Ethanol (70%) 

¢ Phenol (pH 8.0) 

¢ Chloroform: Isoamyl Alcohol (24:1) 
¢ Chilled Isopropanol 
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Sodium acetate solution (3 M) 
Proteinase K (20 mg/ml) 

20% (w/v) SDS 

TE (pH 7.6) 

Agarose 

10X TBE 

Ethidium Bromide 


Loading dye 


Functions of different reagents used in DNA extraction 


SDS: It’s a detergent. It helps in lysis of cells by removing lipid 
molecules and thereby causes disruption of cell membrane. 
Proteinase K: It breaks down peptides into smaller units and hence 
facilitates the removal of protein from the cell extract during 
treatment. 
Phenol: Chloroform: Isoamyl alcohol (25:24:1): Phenol and 
chloroform act as protein solvent and help in removal of protein. The 
organic phase contains protein and cell debris where as aqueous 
contains nucleic acids. Isoamyl alcohol reduces formation of froth 
during extraction process. 
Sodium Acetate & chilled propanol: These are used in precipitation of 
DNA. 
70% Ethanol: It is used to remove salts and contaminants. 
TE Buffer: DNA is preserved in TE Buffer. 
Lysis Buffer-I (pH 8.0) 

30 mM Tris-Cl 

5mM EDTA 
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50mM NaCl 

e Lysis Buffer-II (pH 8.0) 
2mM EDTA 
75mM NaCl 


Store the buffer at room temperature. 


3.3.3 DNA QUANTIFICATION 
The extracted DNA samples were quantified by spectrophotometer. 
e Optical density (OD) of DNA samples were collected at two different 
wavelengths (A1=260nm, 12=280nm). 
e Quantity of DNA (ug/ml) = OD at 260nm X 50 X dilution factor 
50 pg/ml = extinction co-efficient for DNA 
DNA QUALITY 
e Quality of DNA was checked by agarose gel electrophoresis (0.8% 
agarose gel) and verified with the help of GelDoc System. 


e Spectrophotometric analysis was also carried out. 


Agarose Gel Electrophoresis 


e For non-PCR products 0.8% agarose gel is used. 
0.8% agarose gel = 0.4g Agarose + 50ml TBE Buffer (1X) 
e Add 5ul of Ethidium Bromide after properly mixing agarose in the 
buffer. 
e Once the gel is cast, the comb is removed. 5u1 of each DNA sample is 
mixed with 5yul of loading dye & then loaded into well of the gel. After 
loading, the gel was run at 70Volt. for 15 minutes. 


e Then the gel was visualized under UV using Gel Doc. System. 
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3.3.4 MULTIPLEX PCR 

The DNA samples were amplified in Thermal Cycler (PTC 200, MJ Research 
Inc., US) using AmpF/ STR Yfiler PCR Amplification Kit™ (Applied 
Biosystems, Foster City, CA, USA) for 17 Y-STR loci simultaneously by 
Multiplex PCR as per the manufacturer’s instructions. The analyzed Y-STR 
loci include DYS 19, DYS 3891, DYS 389II, DYS 390, DYS 391, DYS 392, 
DYS 393, DYS 385a/b, DYS 437, DYS 438, DYS 439, DYS 448, DYS 456, 
DYS 458, DYS 635 and YGATAH4. 


Table 3.1: Yfiler Kit loci and alleles 


Locus Alleles included in AmpF/STR Yfiler | Dye 
designation | Allelic Ladder label 
DYS456 13, 14, 15, 16, 17, 18 

DYS389 I 10, 11, 12, 13, 14, 15 

DYS390 18, 19, 20, 21, 22, 23, 24, 25, 26, 27 OG: 
DYS389 II | 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 FAM™ 
DYS458 14, 15, 16, 17, 18, 19, 20 

DYS19 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 


DYS385 a/b | 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, | VIC® 
20, 21, 22, 23, 24, 25 


DYS393 8,9, 10, 11, 12, 13, 14, 15, 16 
DYS391 7, 8,9, 10, 11, 12, 13 
DYS439 8,9, 10, 11, 12, 13, 14, 15 NED™ 


DYS635 20, 21, 22, 23, 24, 25, 26 
DYS392 7, 8,9, 10, 11, 12, 13, 14, 15, 16, 17, 18 
Y GATA H4 | 8, 9, 10, 11, 12, 13 


DYS437 13, 14, 15, 16, 17 
DYS438 8,9, 10, 11, 12, 13 PET® 
DYS448 17, 18, 19, 20, 21, 22, 23, 24 
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AmpF/ STR Yfiler fluorescent multi-color dye technology allows the analysis 
of multiple loci, including loci that have alleles with overlapping size ranges. 
Alleles for overlapping loci are distinguished by labeling locus-specific 
primers with different colored dyes. Multi-component analysis is the process 
that separates the five different fluorescent dye colors into distinct spectral 
components. The four dyes used in the Yfiler® Kit to label samples are 6- 
FAM™, VIC®, NED™, and PET® dyes. The fifth dye, LIZ®dye, is used to 
label the GeneScan™ 500 LIZ® Size Standard or the GeneScan™ 600 LIZ® 
Size Standard v2.0. Each of these fluorescent dyes emits its maximum 
fluorescence at a different wavelength. During data collection on the Life 
Technologies instruments, the fluorescence signals are separated by 
diffraction grating according to their wavelengths and projected onto a charge- 
coupled device (CCD) camera in a predictably spaced pattern. The 6-FAM™ 
dye emits at the shortest wavelength and it is displayed as blue, followed by 
the VIC® dye (green), NED™ dye (yellow), PET® dye (red), and LIZ® dye 


(orange). 


Although each of these dyes emits its maximum fluorescence at a different 
wavelength, there is some overlap in the emission spectra between the dyes 
(Figure 3). The goal of multi-component analysis is to correct for spectral 


overlap. 
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Figure 3.1: Emission spectra of the five dyes used in the Yfiler Kit 


3.3.4.1 Preparation of PCR Master Mix 
1. Master mix was prepared. 
AmpFISTR® PCR Reaction Mix = 9.2 nL 
AmpFISTR® Yfiler® Primer Set = 5.0 wL 
AmpliTaq Gold® DNA Polymerase = 0.8 wL 
Total volume = 15 nL 
2. DNA samples were prepared. 
e Negative control - Add 10 wL TE buffer (10mM Tris, 0.1mM EDTA, 
pH 8.0). 
e Test sample - Dilute a portion of the test DNA sample with low-TE 
buffer so that 1.0 ng of total DNA is in a final volume of 10 wL. Add 
10 wL of the diluted sample to the reaction mix. 
e Positive control - Add 10 uL of control DNA (0.1 ng/L). 
3. The final reaction volume (sample or control plus master mix) is 25 wL. 
4. The sample mixture were amplified in Thermal cycler (under the conditions 


described in table 3.3.4.2). 
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3.3.4.2 Cycling conditions for Multiplex PCR 


Initial Denat | Anneal Extend | Final Final 
incubation step | ure extension | hold 
Hold 30 Cycles Hold Hold 
95°C, 11 min 94°C, | 61°C, 12°C: 60°C, 80 | 4°C 

1 min | Imin Imin min 00 


3.5 GENOTYPING 


1. 


Amplicons (PCR products) were analyzed on ABI Prism 3130x/ 
Automated Genetic Analyzer (Applied Biosystems, Foster City, CA, 
USA). Allelic designations for different loci were obtained by 
GeneMapper ID software (v. 3.2). 


. For genotyping 9nL of Hi-Di™ Formamide and size standard were 


prepared. 

GeneScan™ 600 LIZ® Size Standard = 0.5 pL 

Hi-Di™ Formamide 8.5 wL 
Into each well of a MicroAmp® Optical 96-Well Reaction Plate, 10uL 
of samples were added. 

9 uL of the formamide:size standard mixture 

1 wL of PCR product or allelic ladder 
The reaction plate was sealed with appropriate septa, then centrifuged 
to ensure that the contents of each well are collected at the bottom. 
The reaction plate was heated in a thermal cycler for 3 minutes at 95°C 
and then placed on ice for 3 minutes immediately. 
Plate assembly was prepared and placed on the autosampler. 


Electrophoresis run was started. 
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8. After electrophoresis, the data collection software stores information 
for each sample in a .fsa file. Analysis of allelic designations for 
different loci and interpretation of results were obtained using 


GeneMapper® JD Software (v3.2) 


3.6 STATISTICAL ANALYSIS 
Allele frequencies were calculated by direct counting. Gene diversity (GD) 


was calculated using the formula (Nei, 1973, 1974): 
n n 
GD =——|1-) | P° 
n-1 2, 
Where, P; is the frequency of ith allele and 1 is number of samples analyzed. 


Haplotype diversity (HD) was calculated as: 


n oe 
HD =——|1- ) X: 
ad 


Where, X; represents haplotype frequency. 


Standard errors for HD were calculated according to the following equation 


(Nei & Kumar, 2000) 


se 2Sxr-(Ex0) | 


Discrimination capacity (DC) was determined by formula Nzp/N, where Nup 


is number of haplotypes observed. 
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3.6.1 AMOVA 

For analysis of molecular variance (AMOVA), online AMOVA tool provided 
by YHRD (Rower et al., 1996; Rower et al., 2001) was used. A total of 11 
population samples with 840 haplotypes were included in this study; 
Jharkhand Sakaldwipi Brahmin population, Karnataka Brahmin population, 
Kashmir Saraswat Brahmin population, Maharashtra Mahadev_ Koli 
population, Punjab Balmiki population, Rajasthan Saraswat Brahmin 
population, Tamil population, Tamil Nadu Iyengar population, Tripuri 
population, West Bengal Rajbanshi population and Odisha Khandayat 


population. 


3.6.2 MDS plot 

Population pairwise distances between Khandayat population of Odisha and 
other Indian populations (Rst values) were calculated. Graphical 
representations of genetic distances between populations were obtained by 
multidimensional scaling analysis (MDS plot). MDS plots were constructed 


based on genetic distances (Nei & Roychoudhury, 1974). 


3.6.3 Dendogram 
Dendogram was constructed using DendroUPGMA software program 


(www.genomes.urv.cat/UPGMA). This program calculates a_ similarity 


coefficient between pairs of sets of variables and transforms these coefficients 
into distances and makes a clustering using the Unweighted Pair Group 
Method with Arithmetic mean (UPGMA) algorithm. Dendogram was 


constructed using Rst values. 
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300 whole blood samples were collected from healthy unrelated males of 
Odisha (150 samples from Khandayats and 150 samples from non- 
Khandayats) for analysis of haplotype diversity at 17 Y-STR loci. 


Khandayat Population 

Among the Khandayat population, 146 different haplotypes were observed. 
The observed haplotype details are discussed in Supplementary Table 1. One 
hundred and forty three (143) haplotypes were unique (97.9452%), which 
were observed only once. Two haplotypes were observed twice (1.36%) and 


only one haplotype was observed thrice (0.6849%). 


Observed alleles among the Khandayats for the 17 Y-STR loci along with 
their allelic frequencies have been mentioned in Table 4.1. The total number 
of alleles observed in this population was found to be 106 and the mean allele 
number per locus was 6.235. Maximum number of alleles was observed at the 
bi-allelic marker DYS385a/b with 33 alleles followed by locus DYS635 with 
7 alleles. Allele frequencies of Khandayats of Odisha varied from 0.0074 to 
0.7426. 


Gene diversity (GD) per locus ranged from 0.4223 to 0.9609 with an average 
GD value of 0.6892. The lowest gene diversity (0.4223) has been found at 
locus DYS391, wherein the most frequent allele has been allele 10 with a 
frequency of 74.26%. The highest gene diversity (0.9609) has been found in 
case of the bi-allelic marker DYS385a/b. Haplotype diversity (HD) value for 
the Khandayat population of Odisha was found to be 0.999128. 
Discrimination capacity (DC) value for the studied samples was calculated to 


be 0.97333. 
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Table 4.1: Allele frequency and gene diversity values of 17 Y-STR loci in the 


Khandayat population of Odisha, India. 


Results & Discussion 


Non-Khandayat Population 

Among the non-Khandayat population, 143 different haplotypes were 
observed. The observed haplotype details are discussed in Supplementary 
Table 2. 138 haplotypes were unique (96.5035%), which were observed only 
once. Three haplotypes were observed twice (2.0979%) and two haplotypes 
were observed thrice (1.3986%). 


Observed alleles among the non-Khandayats for the 17 Y-STR loci along with 
their allelic frequencies have been mentioned in Table 4.2. The total number 
of alleles observed in this population was found to be 112 and the mean allele 
number per locus was 6.588. Maximum number of alleles was observed at the 
bi-allelic marker DYS385a/b with 39 alleles followed by locus DYS635 with 
7 alleles. Allele frequencies of Khandayats of Odisha varied from 0.0088 to 
0.7719. 


Gene diversity (GD) per locus ranged from 0.3802 to 0.9635 with an average 
GD value of 0.6835. The lowest gene diversity (0.3802) has been found at 
locus DYS391, wherein the most frequent allele has been allele 10 with a 
frequency of 74.26%. The highest gene diversity (0.9635) has been found in 
case of the bi-allelic marker DYS385a/b. 


Haplotype diversity (HD) value for the Khandayat population of Odisha was 


found to be 0.99891321. Discrimination capacity (DC) value for the studied 
samples was calculated to be 0.95333. 
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Table 4.2: Allele frequency and gene diversity values of 17 Y-STR loci in the 


non-Khandayat population of Odisha, India. 
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Figure 4.1: Allelic frequency distribution for locus DYS19 among Khandayat 


and non-Khandayat population samples 


Allelic frequency for locus DYS19 among the Khandayats varied from 0.0588 
to 0.3676. Among the non-Khandayats, allelic frequency for this locus ranged 
from 0.0702 to 0.3860. 
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Figure 4.2: Allelic frequency distribution for locus DYS389I among 
Khandayat and non-Khandayat population samples 


Allelic frequency for locus DYS389I among the Khandayats varied from 
0.0074 to 0.4632. Among the non-Khandayats, allelic frequency for this locus 
ranged from 0.0088 to 0.4649. 
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Figure 4.3: Allelic frequency distribution for locus DYS389II among 
Khandayat and non-Khandayat population samples 


Allelic frequency for locus DYS389II among the Khandayats varied from 
0.1471 to 0.3529. Among the non-Khandayats, allelic frequency for this locus 
ranged from 0.1404 to 0.3684. 
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Figure 4.4: Allelic frequency distribution for locus DYS390 among 
Khandayat and non-Khandayat population samples 


Allelic frequency for locus DYS390 among the Khandayats varied from 
0.1765 to 0.3897. Among the non-Khandayats, allelic frequency for this locus 
ranged from 0.1579 to 0.4123. 
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Figure 4.5: Allelic frequency distribution for locus DYS391 among 
Khandayat and non-Khandayat population samples 


Allelic frequency for locus DYS391 among the Khandayats varied from 
0.0441 to 0.7426. Among the non-Khandayats, allelic frequency for this locus 
ranged from 0.0088 to 0.7719. 
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Figure 4.6: Allelic frequency distribution for locus DYS392 among 
Khandayat and non-Khandayat population samples 


Allelic frequency for locus DYS392 among the Khandayats varied from 
0.0368 to 0.5441. Among the non-Khandayats, allelic frequency for this locus 
ranged from 0.0439 to 0.5614. 
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Figure 4.7: Allelic frequency distribution for locus DYS393 among 
Khandayat and non-Khandayat population samples 


Allelic frequency for locus DYS393 among the Khandayats varied from 
0.0074 to 0.3971. Among the non-Khandayats, allelic frequency for this locus 
ranged from 0.0088 to 0.3860. 
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Figure 4.8: Allelic frequency distribution for locus DYS438 among 
Khandayat and non-Khandayat population samples 


Allelic frequency for locus DYS438 among the Khandayats varied from 
0.0294 to 0.3750. Among the non-Khandayats, allelic frequency for this locus 
ranged from 0.0351 to 0.3596. 
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Figure 4.9: Allelic frequency distribution for locus YGATAH4 among 
Khandayat and non-Khandayat population samples 


Allelic frequency for locus YGATAH4 among the Khandayats varied from 
0.0588 to 0.4559. Among the non-Khandayats, allelic frequency for this locus 
ranged from 0.0526 to 0.5088. 
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Figure 4.10: Allelic frequency distribution for locus DYS439 among 
Khandayat and non-Khandayat population samples 


Allelic frequency for locus DYS439 among the Khandayats varied from 
0.0368 to 0.3456. Among the non-Khandayats, allelic frequency for this locus 
ranged from 0.0439 to 0.3421. 
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Figure 4.11: Allelic frequency distribution for locus DYS437 among 
Khandayat and non-Khandayat population samples 


Allelic frequency for locus DYS437 among the Khandayats varied from 
0.0074 to 0.5588. Among the non-Khandayats, allelic frequency for this locus 
ranged from 0.0088 to 0.5526. 
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Figure 4.12: Allelic frequency distribution for locus DYS448 among 
Khandayat and non-Khandayat population samples 


Allelic frequency for locus DYS448 among the Khandayats varied from 
0.0074 to 0.5294. Among the non-Khandayats, allelic frequency for this locus 
ranged from 0.0088 to 0.5439. 
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Figure 4.13: Allelic frequency distribution for locus DYS456 among 
Khandayat and non-Khandayat population samples 


Allelic frequency for locus DYS456 among the Khandayats varied from 
0.0074 to 0.5368. Among the non-Khandayats, allelic frequency for this locus 
ranged from 0.0088 to 0.5614. 
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Figure 4.14: Allelic frequency distribution for locus DYS458 among 
Khandayat and non-Khandayat population samples 


Allelic frequency for locus DYS458 among the Khandayats varied from 
0.0294 to 0.2941. Among the non-Khandayats, allelic frequency for this locus 
ranged from 0.0351 to 0.3719. 
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Figure 4.15: Allelic frequency distribution for locus DYS635 among 
Khandayat and non-Khandayat population samples 


Allelic frequency for locus DYS635 among the Khandayats varied from 
0.0294 to 0.2721. Among the non-Khandayats, allelic frequency for this locus 
ranged from 0.0351 to 0.2632. 
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Figure 4.16: Allelic frequency distribution for locus DYS385a/b among 
Khandayat and non-Khandayat population samples 


Allelic frequency for locus DYS385a/b among the Khandayats varied from 


0.0074 to 0.0809. Among the non-Khandayats, allelic frequency for this locus 
ranged from 0.0088 to 0.0702. 
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Figure 4.17: Gene Diversity (GD) value of various Y-STR loci among 
Khandayat and non-Khandayat population samples 


Gene Diversity for various Y-STR loci among the Khandayats varied from 


0.4223 to 0.9609. Among the non-Khandayats, gene diversity value for 
various Y-STR loci ranged from 0.3802 to 0.9635. 
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Results & Discussion 


For extensive analysis of the genetic relatedness, haplotypes of the Khandayat 
population of Odisha were compared via AMOVA with haplotypes of other 
populations of India. The details of Indian populations used for comparative 
analysis are discussed in Table 4.3. Analysis of molecular variance pairwise 
distances based on Rst values between the Khandayats and other Indian 
populations are described in Table4.4. Results revealed that Khandayat 


population is not closely related to other Indian populations. 


Table 4.3: Details of studied Indian populations 


Sl. no. Population name Location No. of haplotypes 
1 Khandayat Odisha 146 
2 Sakaldwipi Brahmin Jharkhand 65 
3 Brahmin Karnataka 103 
4 Saraswat Brahmin Kashmir 58 
z) Mahadev Koli Maharashtra 65 
6 Balmiki Punjab 62 
q Saraswat Brahmin Rajasthan 60 
8 Tamil Southern India 126 
9 Iyengar Tamil Nadu 67 
10 Tripuri Tripura 65 
11 Rajbanshi West Bengal 39 
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Results & Discussion 


Table 4.4: AMOVA pairwise distances based on Rst values between the 


Khandayat population of Odisha and other Indian populations. 


Population 
JhBr 
KBr 
KSBr 
MK 
PB 
RSBr 
STm 
TNI 
TrI 
WBRj 


ODKh 


JhBr 


0.2362 


0.2048 


0.1422 


*JhBr-Jharkhand Sakaldwipi Brahmin, RSBr-Rajasthan Saraswat Brahmin, KBr-Karnataka 


Brahmin, STm-Southern India Tamil, TNI-Tamil Nadu Iyengar, KSBr- Kashmir Saraswat 
Brahmin, MK-Maharashtra Mahadev Koli, PB-Punjab Balmiki, Trl-Tripura Tripuri, WBRj- 
West Bengal Rajbanshi, OdKh-Odisha Khandayat. 
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Results & Discussion 


Multi dimensional scaling plot (MDS plot) based on pairwise genetic 
distances or Rst values between Khandayat population of Odisha and other 


Indian population was constructed (Figure 4.18). 
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Figure 4.18: MDS Plot for Indian populations 
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Figure 4.19: Neighbor Joining Tree showing relationship between Khandayat 


population and other Indian populations 


*JhBr-Jharkhand Sakaldwipi Brahmin, RSBr-Rajasthan Saraswat Brahmin, KBr-Karnataka 
Brahmin, STm-Southern India Tamil, TNI-Tamil Nadu Iyengar, KSBr- Kashmir Saraswat 
Brahmin, MK-Maharashtra Mahadev Koli, PB-Punjab Balmiki, Trl-Tripura Tripuri, WBRj- 
West Bengal Rajbanshi, OdKh-Odisha Khandayat. 
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KSBr 


eS 


Figure 4.20: Dendogram showing relationship between Khandayat population 


and other Indian populations 


*JhBr-Jharkhand Sakaldwipi Brahmin, RSBr-Rajasthan Saraswat Brahmin, KBr-Karnataka 
Brahmin, STm-Southern India Tamil, TNI-Tamil Nadu Iyengar, KSBr- Kashmir Saraswat 
Brahmin, MK-Maharashtra Mahadev Koli, PB-Punjab Balmiki, Trl-Tripura Tripuri, WBRj- 
West Bengal Rajbanshi, OdKh-Odisha Khandayat. 
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Results & Discussion 


To study genetic relatedness between Khandayats and global populations, 


MDS plot was constructed using Rst values (Figure 4.21). 
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Figure 4.21: MDS Plot for global populations 
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Figure 4.22: MDS Plot for global populations 
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Results & Discussion 


Population genetics studies have revealed large extent of genetic diversity 
among different populations of India (Sahoo et al., 2006; Barik et al., 2008; 
Mukherjee et al., 2009; Yadav et al., 2011; Khurana et al., 2014). The present 
study was carried out with an aim to study the haplotypes diversity at 17 Y- 
STR loci of Khandayat population of Odisha and compare the same with other 
Indian populations in order to find out the genetic relationship between 
different populations. Although data on various Indian populations have been 
reported, there are no published data available about the genetic structure of 
the Khandayat population of Odisha elucidating the haplotype diversity based 
on 17 Y-STR loci. Majority of haplotypes obtained in this study are unique. 
Among the 17 Y-STR loci analyzed, the highest gene diversity (0.9609) was 
observed for locus DYS 385a/b and the lowest gene diversity (0.4223) was 
observed in case of locus DYS 391, which is in accordance with one of the 


earlier findings in South Indian population data (Balamurugan et al., 2010). 


For extensive analysis of the genetic relatedness, haplotypes of Khandayat 
population of Odisha were compared via AMOVA with Jharkhand, India 
(Sakaldwipi Brahmin) population sample with 65 haplotypes, Karnataka, 
India (Brahmin) population sample with 103 haplotypes, Kashmir, India 
(Saraswat Brahmin) population sample with 58 haplotypes, Maharashtra, 
India (Mahadev Koli) population sample with 65 haplotypes, Punjab, India 
(Balmiki) population sample with 62 haplotypes, Rajasthan, India (Saraswat 
Brahmin) population sample with 60 haplotypes, Southern India, India 
(Tamil) population sample with 126 haplotypes, Tamil Nadu, India (Iyengar) 
population sample with 67 haplotypes, Tripura, India (Tripuri) population 
sample with 65 haplotypes, West Bengal, India (Rajbanshi) population sample 
with 39 haplotypes. 
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Results & Discussion 


Comparative analysis revealed that pairwise genetic distance values ranged 
from 0.0012 to 0.3863. Observations from multi dimensional scaling plot 
(MDS plot) base on Rst values revealed that Khandayat population of Odisha 
is significantly different from other Indian populations. Results as illustrated 
with the MDS plot, shows high level of heterogeneity between Indian 
populations. Among the various Indian populations, Khandayat population 
show closer similarity with Rajbanshi population (West Bengal, India) and 
Tripuri population (Tripura, India). The Neighbor Joining Tree and 


dendogram also reveal the same kind of results. 


The haplotypes of the Khandayats were compared with the haplotypes of 
various global populations. The pairwise difference analysis results were 
0.0128 for Australia [Aboriginal] 0.0681 for the Taiwan population, 0.1023 
for the Afghanistan [Afghan], and 0.1785 for Germans. These values show 
that Khandayat population is distant from other European populations and 
close to the Australia [Aboriginal] and Turkish population (Figure 4.21 & 
Figure 4.22). 


Haplotype diversity and discrimination capacity for the studied Khandayat 
population were found to be 0.999128 and 0.95588 respectively, which imply 
that the 17 Y-STR loci studied in the Khandayat population are highly 
polymorphic. A higher degree of haplotype diversity and discrimination 
capacity indicates that 17 Y-STR loci used in the current study are highly 
polymorphic among the Khandayat community. Thus, this set of Y-STRs can 
be used for the forensic purposes like paternity testing, individual 
identification, genetic mapping etc. and this will add to the databank of 
various studies conducted on Indian population as no previous Y-STR data are 


available in the literature for this population. 
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DNA typing for forensic identification is a two step process. The first step 
involves developing the profiles from samples collected at the crime scene 
and comparing them with the profiles obtained from suspects and the 
victims. In the case of a match that includes the suspect as potential source 
of the sample collected at the crime scene, the last step in the process is to 
answer the question, what is the likelihood that someone in addition to the 
suspect could match the profile of the analyzed sample? This likelihood is 
calculated by determining the frequency of the suspect’s profile in the 
relevant population databases. The issue becomes more relevant in the 
case of discrete polymorphic markers that show higher probability of 
occurrence in the reference population, where several orders of magnitude 
difference between the databases may have an impact on the jury. This 
necessitates development of reference database for different population for 


forensic purposes. 


India is known for its vast human diversity, consisting of more than four 
and a half thousand anthropologically well-defined populations. Each 
population differs in terms of language, culture, physical features and, 
most importantly, genetic architecture. There has been tremendous interest 
among historians, archaeologists, anthropologists, linguists and geneticists 
to understand the unique structure of Indian populations and their affinities 
with the rest of the world. Most importantly, researchers working on 
various diseases often find that disease-causing genetic variations are 
different in Indian populations. During the last two decades, many exciting 
observations have been made regarding Indian people by several 
investigators; however, these findings have remained scattered. During the 
past two decades, we have witnessed remarkable advancements in 


technology. We have advanced from low resolution genetic markers to 
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Conclusion 


high throughput whole genome sequencing. Despite these advancements, 


studies using high density markers were lacking in the Indian scenario. 


Therefore, an attempt was made to extensively study Odisha populations 
using 17 Y-STR markers. Higher degrees of haplotype diversity and 
discrimination capacity indicate that 17 Y-STR loci used in the current 
study are highly polymorphic among the Khandayat community. Thus, 
this set of Y-STRs can be used for the forensic purposes like paternity 
testing, individual identification, genetic mapping etc. and this will add to 
the databank of various studies conducted on Indian population as no 
previous Y-STR data are available in the literature for this population. 
Comparative analysis of Khandayat population with other Indian 
populations revealed that, this population is highly endogamous and there 


is little genetic influence from other populations. 


This study laid out a plethora of information on Khandayat population of 
Odisha and the data presented in this study would aid to future 
comparisons of different population genetics research based on Y-STR 
markers. Future research work can be carried out using newly available Y- 


STR markers out in order to study the diversity among the Khandayats. 
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Appendices 


Supplementary Table 1: Haplotypes of Khandayat Population 


g _ 4 =e 
= Dn Nn Dn Nn Dn DN Dn Dn Nn nN nN nN Nn nN Nn < ia 
= SA & & fe BR & 6S Se ae ES ¢ 
Khl 15 13 27 22 10 12 13 13,19 11 10 16 19 16 16 24 12 1 
Kh2 13 13 27 22 10 14 11 13,17 10 14 15 18 17 15 22 12 1 
Kh3 13 12 28 24 10 13 12 14,16 10 11 14 18 16 18 22 10 1 
Kh4 15 13 29 21 11 11 13 11,17 11 10 14. 20 15 16 23 13 1 
Kh5 13 13 27 22 12 14 12 11,16 10 11 16 19 16 16 23 11 1 
Kh6 13 13 28 22 10 14 11 13,19 10 12 15 19 16 16 22 12 1 
Kh7 15 13 29 21 10 12 13 11,19 11 10 14. 20 15 15 23 12 1 
Kh8 13 14 30 23 10 12 12 12,17 10 10 14 19 15 18 21 11 1 
Kh9 14 13 30 23 10 13 12 12,17 10 10 14 19 15 19 21 11 1 
Kh10 15 12 29 21 10 12 12 15,17 9 11 14 19 15 17 21 12 
Khl1 2 3 27 22 0 2 2 14,14 9 11 14 19 15 16 20 12 
Kh12 2 2 27 24 0 4 2 13,17 9 12 15 20 16 17 22 11 
Kh13 3 2 30024 1 2 3 12,18 11 10 14 20 16 16 23 13 
Khl4 3 4 30 «22 0 1 2 15,17 9 11 14 19 17 16 21 12 
Kh15 2 3 29 «224 0 2 2 12,20 10 12 14 19 15 18 21 12 
Khl6 3 2 2] 22 0 4 1 13,20 10 12 15 19 15 16 21 12 
Khl17 3 3 30 «24 0 2 3 15,21 10 12 14 19 15 16 20 11 
Kh18 3 3 27) 22 0 2 3 12,20 10 11 15 19 16 17 21 11 
Khl19 3 3 27 22 0 1 2 15,19 9 11 14 19 15 17 21 11 
Kh20 4 3 29 23 0 2 3 13,18 1 13 16 18 17 18 24 10 
Kh21 4 3 27 22 0 1 3 12,17 1 11 15 19 15 17 23 10 
Kh22 3 2 28 23 0 2 3 13,19 1 11 16 20 15 18 24 12 
Kh23 3 3 29 «21 1 2 3 11,17 1 10 14 20 16 16 23 12 
Kh24 3 2 28 22 0 4 1 13,21 10 12 15 19 15 14 24 12 
Kh25 3 3 27 22 0 2 2 15,17 9 12 14 19 16 17 22 12 
Kh26 3 3 27 24 0 4 3 13,18 1 11 16 20 15 16 21 11 
Kh27 4 4 29. 22 0 2 2 15,18 9 12 14 19 16 18 21 12 
Kh28 3 30 «22 0 0 2 13,18 1 14 16 19 15 18 25 12 
Kh29 4 3 30.021 0 2 3 10,19 1 10 14 20 15 17 23 12 
Kh30 4 4 300-23 1 3 2 16,17 9 1 14 19 15 16 20 12 
Kh31 5 3 27; 22 0 2 2 15,21 9 12 14 20 16 18 19 12 
Kh32 6 3 30 «24 0 2 3 13,20 1 10 14 20 15 16 23 13 
Kh33 2 4 29° «23 0 3 4 12,18 J 1 17 19 17 18 25 12 
Kh34 3 4 30.)—s 21 0 2 3 11,20 1 10 14 20 15 16 23 13 
Kh35 4 4 30.021 0 2 3 11,19 1 10 14 20 15 17 21 13 
Kh36 5 3 30 «22 0 2 2 14,19 1 1 15 19 15 19 21 11 
Kh37 4 3 27 23 0 2 4 13,20 9 L 14 21 14 17 24 11 
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g _ 4 = Sb 
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